Lightnews — Scholar-powered news

Toby Ord

@tobyord.bsky.social

What ideas are already out there, just waiting on someone to really feel their power and bring them down from the ivory tower?

October 13, 2025 at 5:11 PM

Toby Ord

@tobyord.bsky.social

During questions someone asked what we can learn about how to write an influential paper. Equally important is what we can learn about reading such a paper. So many philosophers had read it in the intervening generation, but none had taken it seriously.

October 13, 2025 at 5:05 PM

Toby Ord

@tobyord.bsky.social

It made me realise for the first time that I was essential in making it so — that one Australian in Oxford in 1971 had thrown the ball far far down the field, to be received by another Australian in Oxford in 2004.

October 13, 2025 at 5:01 PM

Toby Ord

@tobyord.bsky.social

If you want to go a little deeper, see my full post:
www.tobyord.com/writing/most...
14/14

Evidence that Recent AI Gains are Mostly from Inference-Scaling — Toby Ord

In the last year or two, the most important trend in modern AI came to an end. The scaling-up of computational resources used to train ever-larger AI models through next-token prediction ( pre-trainin...

www.tobyord.com

October 3, 2025 at 7:40 PM

Toby Ord

@tobyord.bsky.social

So it looks like most of the gains are coming from the ability to spend more compute on each answer rather than from better ability to reason for the same token budget.
This shift to inference-scaling has big implications for AI business, governance, and risk:
www.tobyord.com/writing/infe...
13/

Inference Scaling Reshapes AI Governance — Toby Ord

The shift from scaling up the pre-training compute of AI systems to scaling up their inference compute may have profound effects on AI governance. The nature of these effects depends crucially on whet...

www.tobyord.com

October 3, 2025 at 7:39 PM

Toby Ord

@tobyord.bsky.social

And here are the relative boosts.
Overall the inference scaling produced 82%, 63%, and 92% of the total performance gains on the different benchmarks.
12/

October 3, 2025 at 7:38 PM

Toby Ord

@tobyord.bsky.social

As you can see, most of the boost is coming from the inference-scaling that the RL training has enabled.
The same is true for the other benchmarks I examined. Here are the raw scatterplots:
11/

October 3, 2025 at 7:37 PM

Toby Ord

@tobyord.bsky.social

We can draw the trend on the chart, then divide the performance boost in two:
• the RL boost taking the base model to the trend line
• the inference-scaling boost taking it to the top of the trend
10/

October 3, 2025 at 7:37 PM

Toby Ord

@tobyord.bsky.social

Note how there is a clear trend line for the reasoning models, showing how their performance scales with more inference. The base model is slightly below this trend.
9/

October 3, 2025 at 7:36 PM

Toby Ord

@tobyord.bsky.social

I worked out a nice clean way to separate this out. Here is data from the MATH level 5 benchmark, showing performance vs token-use for a base model (Sonnet 3.6 – orange square) and its reasoning model (Sonnet 3.7 – red circles).
8/

October 3, 2025 at 7:35 PM

Toby Ord

@tobyord.bsky.social

But it turns out that even when reasoning is turned off, these models are using many more tokens to generate their answers, so even this boost is partly just from RL and partly from the inference-scaling.
7/

October 3, 2025 at 7:35 PM

Toby Ord

@tobyord.bsky.social

Often people assume it is mostly about the training. One piece of evidence for this is that even without reasoning turned on, a reasoning model seems to perform substantially better than its base model (i.e. a model that differs only in not having the RL training)
6/

October 3, 2025 at 7:34 PM

Toby Ord

@tobyord.bsky.social

But it is hard to tease out how much of the benefits of RL are coming directly from the training (1) and how much are coming from using far more tokens to run it (2).
5/

October 3, 2025 at 7:33 PM

Toby Ord

@tobyord.bsky.social

But (2) is less rosy.
For the largest AI companies, most costs come from deploying models to customers. If you need to 10x or 100x those costs, that is very expensive. And unlike training, it can't be made up in volume.
4/

October 3, 2025 at 7:33 PM

Toby Ord

@tobyord.bsky.social

Many people focus on (1).
This is the bull case for RL scaling — it started off small compared to internet-scale pre-training, so can be scaled 10x or 100x before doubling overall training compute.
3/

October 3, 2025 at 7:32 PM

Toby Ord

@tobyord.bsky.social

Scaling up AI using next-token prediction was the most important trend in modern AI. It stalled out over the last couple of years and has been replaced by RL scaling.

This has two parts:
1. Scaling RL training
2. Scaling inference compute at deployment

2/

October 3, 2025 at 7:31 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news