Stephan Hoyer
banner
stephanhoyer.com
Stephan Hoyer
@stephanhoyer.com
Building AI climate models at Google. I also contribute to the scientific Python ecosystem, including Xarray, NumPy and JAX.

Opinions are my own, not my employer's.
Woah! What visualization tool are you using?
October 9, 2025 at 4:09 PM
Do you take it yourself?
May 13, 2025 at 3:43 PM
I think the problem is the algorithm. BlueSky's lack of a recommendation engine means that if you're not posting all the time, your stuff doesn't get seen.
May 6, 2025 at 3:07 PM
The "ungamable impact" of OSS really resonates with me:
www.thonking.ai/i/158277004/...

Sadly it does not necessarily align with what makes for a sucessful career in Big Tech. But it's worth trying anyways! :)
Why PyTorch is an amazing place to work... and Why I'm Joining Thinking Machines
In which I convince to you to join either PyTorch or Thinking Machines!
www.thonking.ai
March 4, 2025 at 11:22 PM
I think it's just about readability with small font, the same reason why printed newspapers use many columns.
February 2, 2025 at 8:17 PM
The losses here should be marked as millions not billions, right?
January 27, 2025 at 5:45 PM
Pretty much anything that you can write in high level array code like NumPy is very fast in JAX. Only intrinsically very loopy code is (relatively) slow, but JAX has excellent support for writing custom kernels in lower level languages.
January 23, 2025 at 6:11 AM
AD compatible Python is at the cutting edge of performance these days with it's central role in large-scale AI training.

In my experience (mostly geophysical fluid dynamics) JAX has comparable perf to modern Fortran on CPUs, with a much easier path to GPUs and multi-device code.
January 23, 2025 at 1:07 AM
Those are tiny chunks! Does that reduce max throughput for analytics use-cases compared to larger chunks?
January 10, 2025 at 9:44 PM
Such exciting news!

For anyone who has tried the new sharding feature -- do you have any guidance on optimal shard sizes, if I want more flexibility in access patterns but still optimal throughput?
January 10, 2025 at 3:17 AM
Reposted by Stephan Hoyer
Is there a link between #ClimateChange & increasing risk/severity of #wildfire in California--including the still-unfolding disaster? Yes. Is climate change the only factor at play? No, of course not. So what's really going on? [Thread] #CAfire #CAwx #LAfires iopscience.iop.org/a...
January 9, 2025 at 10:05 PM
This is a huge milestone for cloud-native big scientific data!
zarr.dev Zarr @zarr.dev · Jan 9
🎉 Zarr-Python 3 is here! 🎉

- Full support for Zarr v3 spec
- Chunk-sharding for more efficient data storage
- Major performance boosts with async I/O & parallel compression

💻 pip install --upgrade zarr
💻 conda install --channel conda-forge zarr

Blog post: https://buff.ly/3C3OwYw
January 9, 2025 at 11:55 PM
Reposted by Stephan Hoyer
Hi, thanks for the mention. Here's a 7-day paywall-free link to the main feature: www.bloomberg.com/graphics/202...
The Risky Business of Predicting Where Climate Disaster Will Hit
Climate tech companies can calculate the chances that a flood or wildfire will ravage your home. But what if their odds are all different?
www.bloomberg.com
December 30, 2024 at 5:27 PM
This paper by Watt-Meyer et al is a good example of "Error-based learning:" agupubs.onlinelibrary.wiley.com/doi/10.1029/...

ECMWF has also done similar work on top of IFS's data assimilation system.
Correcting Weather and Climate Models by Machine Learning Nudged Historical Simulations
Nudging an atmospheric model toward observations is a good way to estimate state-dependent biases Machine learning of state-dependent biases improves hindcast skill of a coarse-resolution general...
agupubs.onlinelibrary.wiley.com
December 28, 2024 at 8:35 PM
Reposted by Stephan Hoyer
Some thoughts on the use of AI/ML in climate modeling...

@realclimate.org

¡AI Caramba! www.realclimate.org/index.php/ar...
¡AI Caramba!
Rapid progress in the use of machine learning for weather and climate models is evident almost everywhere, but can we distinguish between real advances and vaporware? First off, let's define some...
www.realclimate.org
December 28, 2024 at 7:36 PM
We have a few pre-computed climatologies in WeatherBench2: weatherbench2.readthedocs.io/en/latest/da...
WeatherBench 2 Data Guide — WeatherBench 2 documentation
weatherbench2.readthedocs.io
December 27, 2024 at 1:39 AM
We have a few other updates to share as well, which can be found in the inaugral edition of the NeuralGCM newsletter:
groups.google.com/g/neuralgcm-...

The biggest one is that NeuralGCM models are now freely available for everyone to use, including for commercial purposes!
NeuralGCM update: new models, new license, new datasets
groups.google.com
December 19, 2024 at 8:34 PM
Can incorporating AI improve precipitation in global weather and climate models?

Yes! In the latest NeuralGCM paper, we show that training on satellite-based precipitation results in significant improvements over traditional atmospheric models:
arxiv.org/abs/2412.11973
Neural general circulation models optimized to predict satellite-based precipitation observations
Climate models struggle to accurately simulate precipitation, particularly extremes and the diurnal cycle. Here, we present a hybrid model that is trained directly on satellite-based precipitation obs...
arxiv.org
December 19, 2024 at 8:34 PM
Please reach out if you want to chat about anything related to AI modeling, NeuralGCM, JAX or Xarray. Also see Eni's poster on xarray.DataTree on Thurs: agu.confex.com/agu/agu24/me...
Simplifying analysis of hierarchical HDF5 and NetCDF4 files with xarray-datatree
NASA’s Earth Observing System Data and Information System (EOSDIS) contains tho...
agu.confex.com
December 9, 2024 at 5:47 PM
Interested in AI weather/climate modeling at #AGU24?

I'll be giving an overview talk on NeuralGCM at 11:30am Wed at the Google booth, and an talk on modeling precipitation with NeuralGCM at 4:25pm Wed in the session A34A.
December 9, 2024 at 5:42 PM
When I hear "ML" I tend to think of old school (i.e., scikit-learn) machine learning, which is great but much less powerful than deep learning. So I would opt for "AI weather models" though that misses quite a bit of nuance.
December 7, 2024 at 7:01 PM
This diagram is accurate historically, but recently AI seems to have become synonymous with deep learning.
December 7, 2024 at 6:57 PM
The bottleneck for traditional models is data movement within the CPU, not data transfer to disk -- physics based simulations do too little compute per byte (low arithmetic intensity) to fully utilize modern hardware.

AI is way better in this respect. It's easy to use lots of FLOPs on big matmuls!
December 7, 2024 at 7:22 AM
Unlimited potential, zero bugs!
December 1, 2024 at 1:09 PM