Lightnews — Scholar-powered news

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

410 followers 930 following 270 posts

Back to France after some time in sunny California and happy Copenhagen. Mistral, Photoroom, Meta (xformers, FairScale, R&D), EyeTribe (acq) Mostly writing around AI

Posts Replies Media Videos

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

Above is intuitive when you think about it long enough (or so it feels at least), but I missed it entirely during a couple of years working on diffusion, so I figured it was worth emphasizing and the authors did too :)

July 26, 2025 at 10:19 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

Worth a deep read in general, not personally completely done with it, I hope it ages well. Closing with some nice insight wrt diffusion models: they don't open up for serial awareness, since model iterates on _the same_ solution, no state space + carry over. _Less_ powerful than autoregressive

July 26, 2025 at 10:19 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

Paper cannot prove its point completely since models are really good approximators, and used as such (hence a formal disprove is not enough). Pretty good hints still, makes me confident we're far from peak efficiency in most use cases (we approx serial awareness by adding tons of compute)

July 26, 2025 at 10:16 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

I think that hardware recommendations are a little naive/premature, as much as I like CPUs nothing will happen prior to needs and solutions being put on the table. Lowering is expensive and risky in general, will happen last, but at least this shows there's kryptonite to GPU dominance

July 26, 2025 at 10:10 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

The paper is very pedagogical, and some takeaways ring pretty reasonable. Intuition is interesting behind LLMs being just ok to not great Chess players (missing the MCTS like mechanism of specialized models), or failing to be effective at multi step reasoning prior to test time compute / CoT

July 26, 2025 at 10:08 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

It then feels like the dichotomy proposed by the paper (inherently parallel and TC0 models will fail on serial problems) is excessive, or at least that the frontier is a bit fuzzy. One line is great though, paraphrasing "only with test time compute did we factor in some serial compute power"

July 26, 2025 at 10:05 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

There are caveats in the definition of "inherently serial" problems:
- not all solutions will require serial computations, even for something outside of TC0
- approximations can fall pretty close, and oftentimes we don´t expect anything much better than an approximation

July 26, 2025 at 10:03 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

Claude Code is really good for some narrowly defined tasks (add unit tests for instance), and in that case it's clearly an agent. The "vibe coding" coding middle ground (with somebody in the loop who doesn't completely get it) is the part on shaky grounds I believe

July 18, 2025 at 8:44 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

Something the LLMs have not seen beforehand (new model architecture for instance). In my experience that's where all the current tools break, for relatable reasons. I guess it's the same for somebody developing a SOTA DB engine or computer shader

July 18, 2025 at 8:41 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

For things LLMs are not great at (typically new, frontier work) you're better off doing it instead of inheriting a broken spaghetti plate. Vibe coding your way to oblivion is not a great proposition for either of these. I don't think there's that much of a middle ground

July 18, 2025 at 9:55 AM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

Qualitatively the chunking is real and meaningful

July 14, 2025 at 4:33 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

I was a bit short on the results in this thread re:HNets, they are pretty convincing even if taking over transformers will take more validation. Of note the models become naturally robust to typos, which is a great omen

July 14, 2025 at 4:31 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

Well you can read my thread, else the link is in the first post :) model weights are open

July 14, 2025 at 4:25 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

HNets is chunking dynamically, that's why it's a big deal for me ! Else byte latents was doing that already, so not exactly nothing but not entirely mature, yes

July 14, 2025 at 3:50 PM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

comparisons with diffusion models are not a complete hit, because the comparison is with undistilled, 1000-steps models, which nobody uses in their right mind (fast samplers & distilled models mean that images are clean in 4-8 steps, 30 tops). The fact that EBT is usable as is is already great

July 13, 2025 at 7:40 AM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

Similarly to HNets I think the proof will be in the scaling, but there are good omens, where the technique works as you would expect it to. For instance, thinking more on out-of-distribution data has a bigger impact than on in-distribution (assuming the model was big enough to capture training set)

July 13, 2025 at 7:34 AM

Benjamin Lefaudeux 🇺🇦

@bentheegg.bsky.social

the big result is in the thinking, in that by opening up the compute valves for the more complicated cases has a meaningful effect.

Note that there's a interesting operating mode attached to being able to self-assess: generate multiple options then pick the better one (self-monte carlo ?)

July 13, 2025 at 7:32 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news