Lightnews — Scholar-powered news

Sam Harsimony

@harsimony.bsky.social

AI companies can straightforwardly avoid a bubble. Their current models are profitable!

The problem is R&D spend chasing scaling laws that continue to hold and continue to have extreme diminishing returns.

Though many have realized this (except xAI).

November 14, 2025 at 5:59 PM

Sam Harsimony

@harsimony.bsky.social

The two types are "low-speed, low-cost" and "high-speed, high-cost"

This tradeoff comes directly from the economics of inference.

2/6

November 13, 2025 at 4:02 PM

Sam Harsimony

@harsimony.bsky.social

They are LoRA-pilled as well:

Furthermore, to improve the inference economics, such RL models could be trained using LoRA adapters or a similar technique and served alongside thousands of other models, all catered to specific use cases. This multi-tenant serving approach represents a compelling business opportunity for inference providers. Clients hosting their custom LoRA adapters on a provider's infrastructure face significant switching costs when migrating to competitors, as the adapters are optimized for specific serving configurations and client workflows. RLFT is based on unique and nuanced rewards that are very client-specific; unlike standard supervised fine-tuning (SFT), it much much more challenging to replicate it just via in-context learning, making it an even more compelling case for inference providers.

We expect the inference markets to further specialize in regard to offered throughput, latency, and pricing. It is only natural for providers of super-fast tokens like Groq and Cerebras to command a much higher premium for the tokens they deliver at few-second latencies and for other providers like NeoCloud specializing in high-latency, high-throughput inference scenarios focused on synthetic data generation. We hope to elaborate on this space in the future text.

October 16, 2025 at 7:43 PM

Sam Harsimony

@harsimony.bsky.social

Their figure 24 confirms what we've been talking about, more GPU's means more performance.

Also, notice that switching from H100 to GB200 with fancy interconnects (that's the NVL72 aka "NVLink") gives a huge performance boost.

October 16, 2025 at 7:43 PM

Sam Harsimony

@harsimony.bsky.social

" ... as we increase the number of nodes involved (the EP number), the per node performance increases."

October 16, 2025 at 7:43 PM

Sam Harsimony

@harsimony.bsky.social

The section on energy efficiency (tokens/s/MW) highlights why data center energy use isn't a big concern.

For low speeds, the GB200 can get you ~8x lower energy use.

Across recent generations, chip designers have gotten ~3x improvements in energy efficiency.

October 14, 2025 at 7:22 PM

Sam Harsimony

@harsimony.bsky.social

Let's look at my preferred metric, the cost per million tokens.

Here's hyperscaler costs for serving DeepSeek-R1-0528 with 1K input tokens and 1K output.

We'll get to the GB200 in a second, but notice how everything else is quite similar in price.

H200 scales better for high interactivity tho.

October 14, 2025 at 7:22 PM

Sam Harsimony

@harsimony.bsky.social

We know gains from reasoning diminish sharply with more tokens. There's probably a fixed amount of thinking that is optimal.

Say you need 5x tokens for optimal thinking, interactivity must go up 5x for users to enjoy same response time.

A one-time jump in the point of diminishing returns.

October 14, 2025 at 7:22 PM

Sam Harsimony

@harsimony.bsky.social

I think this is one of the key charts. The overall datacenter cost for 1M tokens vs how many tokens per second each user enjoys. For different GPU's.

October 14, 2025 at 7:22 PM

Sam Harsimony

@harsimony.bsky.social

The key tradeoff: batching more user requests into a single run (i.e. loading the weights to your GPU's) means the GPU's are more efficient but users have to wait longer.

You can be cheap and slow or fast and expensive.

October 14, 2025 at 7:22 PM

Sam Harsimony

@harsimony.bsky.social

Curve for fentanyl OD's looks similar, esp. considering the age of typical users.

October 3, 2025 at 2:52 PM

Sam Harsimony

@harsimony.bsky.social

Trains a model to choose among LLM responses to get better performance.

With some work sampling and picking LLM responses could give current models a performance bump.

arxiv.org/pdf/2509.06870

September 15, 2025 at 6:33 PM

Sam Harsimony

@harsimony.bsky.social

But can't you convince a few people using impassioned pleas, rhetorical tricks, and lies? Not really.

You see, your enemies can do the *exact same thing*, so it nets to zero.

To win you need asymmetric weapons that point only towards truth. Reasoned debate.

slatestarcodex.com/2017/03/24/g...

quote: "Purely Logical Debate is difficult and annoying. It doesn’t scale. It only works on the subset of people who are willing to talk to you in good faith and smart enough to understand the issues involved. And even then, it only works glacially slowly, and you win only partial victories. What’s the point?

Logical debate has one advantage over narrative, rhetoric, and violence: it’s an asymmetric weapon. That is, it’s a weapon which is stronger in the hands of the good guys than in the hands of the bad guys. In ideal conditions (which may or may not ever happen in real life) – the kind of conditions where everyone is charitable and intelligent and wise – the good guys will be able to present stronger evidence, cite more experts, and invoke more compelling moral principles. The whole point of logic is that, when done right, it can only prove things that are true. ..."

August 21, 2025 at 4:54 PM

Sam Harsimony

@harsimony.bsky.social

"Soft" tactics like reasoned debate and persuasion look superficially like they are losing, yet over the long run have come to dominate everything around us. Particularly for the cause of classical Liberalism.

From one of my favorite SSC posts:

A quote from a slate star codex post: "Robert Frost says “A liberal is a man too broad-minded to take his own side in a quarrel”. Ha ha ha.

And yet, outside of Saudi Arabia you’ll have a hard time finding a country that doesn’t at least pay lip service to liberal ideas. Stranger still, many of those then go on to actually implement them, either voluntarily or after succumbing to strange pressures they don’t understand. In particular, the history of the past few hundred years in the United States has been a history of decreasing censorship and increasing tolerance.

Contra the Reactionaries, feminism isn’t an exception to that, it’s a casualty of it. 1970s feminists were saying that all women need to rise up and smash the patriarchy, possibly with literal smashing-implements. 2010s feminists are saying that if some women want to be housewives, that’s great and their own choice because in a liberal society everyone should be free to pursue their own self-actualization.

And that has corresponded to spectacular successes of the specific causes liberals like to push, like feminism, civil rights, gay marriage, et cetera, et cetera, et cetera.

A liberal is a man too broad-minded to take his own side in a quarrel. And yet when liberals enter quarrels, they always win. Isn’t that interesting?"

August 21, 2025 at 4:54 PM

Sam Harsimony

@harsimony.bsky.social

The paradigm of specialized models distilled from larger models was a predictable result. Points towards a world with "comprehensive AI services", not FOOM (for now).

CAIS poses a different set of risks best addressed by governance and defensive technology.

www.greaterwrong.com/posts/8e3676...

August 10, 2025 at 4:24 PM

Sam Harsimony

@harsimony.bsky.social

OpenAI's claim of an model reaching IMO Gold comes about 0.5-1 years earlier than expected.

This market had 85% confidence it would be solved this year but that fell as time went on:

manifold.markets/Austin/will-...

July 19, 2025 at 10:38 PM

Sam Harsimony

@harsimony.bsky.social

Oh neat. Want to remove some behavior or data from a model? Simply suppress or hide that output and train a fresh model on the clean outputs.

Alignment for simple models in simple environments is looking pretty good.

xcancel.com/Turn_Trout/s...

July 17, 2025 at 9:59 PM

Sam Harsimony

@harsimony.bsky.social

Becker points out that outcomes didn't improve as developers worked through more problems. Suggests that lack of experience is due to Cursor not being useful to these devs previously.

Mellow heuristic leans against Shear.

bsky.app/profile/hars...

July 15, 2025 at 8:37 PM

Sam Harsimony

@harsimony.bsky.social

First, I just realized all their error bars overlap with zero. The real headline should be "LLM's offer zero speedup" not a 20% slowdown.

Quentin's speedup may be a result of chance in addition to his good habits.

July 13, 2025 at 2:57 PM

Sam Harsimony

@harsimony.bsky.social

This post prompted me to look at the price performance of GPU's. Apparently it's stagnated since 2018??

BUT performance on other number formats (e.g. FP4) has improved a lot.

June 30, 2025 at 5:37 PM

Sam Harsimony

@harsimony.bsky.social

Oh that link was for the H100 performance number! For the meteor number you want table 1 in Supplementary Info S9:

June 25, 2025 at 9:35 PM

Sam Harsimony

@harsimony.bsky.social

If true, then the reasons why the human economy approaches steady growth should also apply to AI.

The consistency is remarkable. The US has had 2% per capita GDP growth for the last two centuries.

www.nber.org/system/files...

June 23, 2025 at 5:07 PM

Sam Harsimony

@harsimony.bsky.social

June 13, 2025 at 3:46 PM

Sam Harsimony

@harsimony.bsky.social

Now I'm going to skip some of their technical details and theorems and jump to the experiments.

Their example decentralized system trains as fast (wall clock time) as an example centralized system.

Wonder what the utilization looks like though.

June 13, 2025 at 3:46 PM

Sam Harsimony

@harsimony.bsky.social

They rearrange the formula for a transformer layer into a constant matrix (that's the PE, TE stuff) and a sum.

They claim that the row space of matrix AB is inside the row space of matrix B.

So if weights have a small rank (see prev fig), then the activations should too.

June 13, 2025 at 3:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news