epoch.ai
So we spent the last few months reading legal permits, staring at satellite images, and scouring news sources.
Here’s what you need to know. 🧵
So we spent the last few months reading legal permits, staring at satellite images, and scouring news sources.
Here’s what you need to know. 🧵
Some hyperscalers plan to do it in just 1-2 years from the start of construction.
If they succeed, we’ll see the first GW-scale data centers online in 2026, marking one of the fastest infrastructure build-outs in history. 🧵
Some hyperscalers plan to do it in just 1-2 years from the start of construction.
If they succeed, we’ll see the first GW-scale data centers online in 2026, marking one of the fastest infrastructure build-outs in history. 🧵
One way to read our new capability index is by plotting the benchmark performance you expect to see, for a range of ECI scores 🧵
One way to read our new capability index is by plotting the benchmark performance you expect to see, for a range of ECI scores 🧵
bsky.app/profile/epo...
bsky.app/profile/epo...
The world is about to see multiple 1 GW+ AI data centers.
We mapped their construction using satellite imagery, permits & public sources — releasing everything for free, including commissioned satellite images.
Highlights in thread!
The world is about to see multiple 1 GW+ AI data centers.
We mapped their construction using satellite imagery, permits & public sources — releasing everything for free, including commissioned satellite images.
Highlights in thread!
Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time.
See thread for details!
Our findings: tasks are simple, many don't require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time.
See thread for details!
Corrected results: GPT-5 (high) scores slightly higher than GPT-5 (medium) on the benchmarks we run. They are also now tied on the Epoch Capabilities Index (ECI).
Corrected results: GPT-5 (high) scores slightly higher than GPT-5 (medium) on the benchmarks we run. They are also now tied on the Epoch Capabilities Index (ECI).
The result? This gap is smaller than previously estimated.
On average, it takes 3.5 months for an open-weight model to catch up with closed-source SOTA.
The result? This gap is smaller than previously estimated.
On average, it takes 3.5 months for an open-weight model to catch up with closed-source SOTA.
Our research suggests that conducting 10 GW training runs across two dozen sites—linked by a network spanning thousands of km long—is feasible.
Our research suggests that conducting 10 GW training runs across two dozen sites—linked by a network spanning thousands of km long—is feasible.
The tool addresses one of the field's biggest challenges: benchmark saturation.
It's called the Epoch Capabilities Index (ECI) — here's what makes it different:
The tool addresses one of the field's biggest challenges: benchmark saturation.
It's called the Epoch Capabilities Index (ECI) — here's what makes it different:
But mathematical physicist Svetlana Jitomirskaya argues they lack folklore knowledge: the implicit priors mathematicians build from experience.
Link to video in comments!
But mathematical physicist Svetlana Jitomirskaya argues they lack folklore knowledge: the implicit priors mathematicians build from experience.
Link to video in comments!
Every major shift in math has caught experts off guard, he says. This one will be no different, except that all our predictions will be even more wrong.
Link to video in comments!
Every major shift in math has caught experts off guard, he says. This one will be no different, except that all our predictions will be even more wrong.
Link to video in comments!
Even with reasoning disabled, Haiku 4.5 performs similarly or better than early lightweight reasoning models, like o1-mini.
Even with reasoning disabled, Haiku 4.5 performs similarly or better than early lightweight reasoning models, like o1-mini.
Probably not. From what we can tell, it caps out below 50%.
What about throwing in *every* available model? Infinitely many times? 🧵
Probably not. From what we can tell, it caps out below 50%.
What about throwing in *every* available model? Infinitely many times? 🧵
In light of that, it’s notable that OpenAI is projecting historically unprecedented revenue growth — from $10B to $100B — over the next three years. 🧵
In light of that, it’s notable that OpenAI is projecting historically unprecedented revenue growth — from $10B to $100B — over the next three years. 🧵
Mathematician Jesús De Loera on AI’s potential to democratize mathematical proof and the risks when systems hallucinate with perfect confidence.
Link to video in comments!
Mathematician Jesús De Loera on AI’s potential to democratize mathematical proof and the risks when systems hallucinate with perfect confidence.
Link to video in comments!
OpenAI spent ~$7 billion on compute last year. Most of this went to R&D, meaning all research, experiments, and training.
Only a minority of this R&D compute went to the final training runs of released models.
OpenAI spent ~$7 billion on compute last year. Most of this went to R&D, meaning all research, experiments, and training.
Only a minority of this R&D compute went to the final training runs of released models.
GPT-5 Pro set a new record (13%), edging out Gemini 2.5 Deep Think by a single problem (not statistically significant). Grok 4 Heavy lags. 🧵
GPT-5 Pro set a new record (13%), edging out Gemini 2.5 Deep Think by a single problem (not statistically significant). Grok 4 Heavy lags. 🧵
As a nonprofit, our work is freely accessible for anyone to read, replicate, and build upon.
Our datasets:
As a nonprofit, our work is freely accessible for anyone to read, replicate, and build upon.
Our datasets:
How did we reach this conclusion, and what do we actually know about how GPT-5 was trained?
🧵
How did we reach this conclusion, and what do we actually know about how GPT-5 was trained?
🧵
We also conducted a more holistic evaluation of its math capabilities. 🧵
We also conducted a more holistic evaluation of its math capabilities. 🧵
She thinks that when AI finally can, it will have crossed a threshold in general human-level reasoning.
Link to video in comments!
She thinks that when AI finally can, it will have crossed a threshold in general human-level reasoning.
Link to video in comments!