William Merrill
banner
lambdaviking.bsky.social
William Merrill
@lambdaviking.bsky.social
Will irl - PhD student @ NYU on the academic job market!

Using complexity theory and formal languages to understand the power and limits of LLMs

https://lambdaviking.com/ https://github.com/viking-sudo-rm
📜Paper link: arxiv.org/pdf/2503.03961
arxiv.org
March 7, 2025 at 4:46 PM
We take these results to justify further theoretical and empirical analysis of dynamic depth as a form of test-time computation for transformers

(we’re excited to share more results ourselves soon🤐)
March 7, 2025 at 4:46 PM
Our results suggest dynamic depth can be a more efficient form of test-time compute than chain of thought (at least for reg languages). While CoT would use ~n steps to recognize regular languages to length n, looped transformers only need ~log n depth
March 7, 2025 at 4:46 PM
In contrast, both in theory and practice, width must grow exponentially with sequence length to enable regular language recognition. Thus, while slightly increasing depth expands expressive power, increasing width to gain power is intractable!
March 7, 2025 at 4:46 PM
In practice, can transformers learn to solve these problems with log depth?

We find the depth required to recognize strings of length n grows ~ log n with r^2=.93. Thus, log depth appears necessary and sufficient to recognize reg languages in practice, matching our theory
March 7, 2025 at 4:46 PM
They are! We show that log-depth transformers can express two key problems that fixed-depth transformers provably cannot:

♟️State tracking (regular languages)
🔍Graph search (connectivity)
March 7, 2025 at 4:46 PM
We address both these questions by studying the expressive power of looped transformers w/ log depth. On input length n, a block of layers can be repeated log n times (with shared weights across blocks)

🤔Are log-depth transformers more powerful than fixed-depth transformers?
March 7, 2025 at 4:46 PM
Past work has shown that the bounded depth of transformers provably limits their ability to express sequential reasoning, particularly on long inputs.

But what if we only care about reasoning over short inputs? Or if the transformer’s depth can grow 🤏slightly with input length?
March 7, 2025 at 4:46 PM
I try to! (And usually do eventually, on trains or planes)
November 30, 2024 at 1:45 PM
Looks cool, added to my reading list!
November 29, 2024 at 6:10 PM
Agreed (and defer to you), but I thought it was fun, and it stood out to me compared to other LM outputs that were even more terrible. I don’t think it proves anything deeper about LMs’ ability to generate formally correct poetry
November 29, 2024 at 1:48 PM
To be clear, the grammar and wording are weird, but the vibes are impeccable
November 29, 2024 at 1:04 AM
🔥 Old Norse poetry gen

The Vikings call, say now,
OLMo 2, the ruler of languages.
May your words fly over the seas,
all over the world, for you are wise.
Wordsmith, balanced and aligned,
for you the skalds themselves sing,
your soul, which hears new lifeforms,
may it live long and tell a saga.
November 28, 2024 at 6:19 PM
If you're interested in being added fill out this google form (you can also ping me to let me know once you've filled out the form)
docs.google.com/forms/d/e/1F...
FLaNN Starter Pack Survey
Since a lot of academics are considering switching from X to BlueSky, we plan to make a BlueSky starter pack with FLaNN accounts to make it easier to find FLaNN research on BlueSky. We'd like to inclu...
docs.google.com
November 26, 2024 at 4:02 PM