Daniel Wurgaft
danielwurgaft.bsky.social
Daniel Wurgaft
@danielwurgaft.bsky.social
PhD @Stanford working w @noahdgoodman and research fellow @GoodfireAI
Studying in-context learning and reasoning in humans and machines
Prev. @UofT CS & Psych
💡Key takeaways:
2) A tradeoff between *loss and complexity* is fundamental to understanding model training dynamics, and gives a unifying explanation for ICL phenomena of transient generalization and task-diversity effects!

13/
June 28, 2025 at 2:35 AM
💡Key takeaways:
1) Is ICL Bayes-optimal? We argue the better question is *under what assumptions*. Cautiously, we conclude that ICL can be seen as approx. Bayesian under a simplicity bias and sublinear sample efficiency (though see our appendix for an interesting deviation!)

12/
June 28, 2025 at 2:35 AM
Ablations of our analytical expression show the modeled computational constraints, in their assumed functional forms, are crucial!

11/
June 28, 2025 at 2:35 AM
And reveals some interesting findings: MLP width increases memorization, which is captured by our model as a reduced simplicity bias!

10/
June 28, 2025 at 2:35 AM
Our framework also makes novel Predictions:
🔹**Sub-linear** sample efficiency → sigmoidal transition from generalization to memorization
🔹**Rapid** behavior change near the M–G crossover boundary
🔹**Superlinear** scaling of time to transience as data diversity increases

9/
June 28, 2025 at 2:35 AM
Intuitively, what does this predictive account imply? A rational tradeoff between a strategy's loss and complexity!

🔵Early: A simplicity bias (prior) favors a less complex strategy (G)
🔴Late: reducing loss (likelihood) favors a better-fitting, but more complex strategy (M)

8/
June 28, 2025 at 2:35 AM
Fitting the three free parameters of our expression, we see that across checkpoints from 11 different runs, we almost perfectly predict *next-token predictions* and the relative distance maps!

We now have a predictive model of task diversity effects and transience!

7/
June 28, 2025 at 2:35 AM
We assume two well-known facts about neural nets as computational constraints (scaling laws and simplicity bias). This allows writing a closed-form expression for the posterior odds!

6/
June 28, 2025 at 2:35 AM
We model our learner as behaving optimally in a hypothesis space defined by the M / G predictors—this yields a *hierarchical Bayesian* view:

🔹Pretraining = updating posterior probability (preference) for strategies
🔹Inference = posterior-weighted average of strategies

5/
June 28, 2025 at 2:35 AM
We now have a unifying language to describe what strategies a model transitions between.

Back to our question:*Why* do models switch ICL strategies?! Given M / G are *Bayes-optimal* for train / true distributions, we invoke the approach of rational analysis to answer this!

4/
June 28, 2025 at 2:35 AM
By computing the distance between a model’s outputs and these predictors, we show models transition between memorizing and generalizing predictors as experimental settings are varied! This yields a unifying view on known ICL phenomena of task diversity effects and transience!

3/
June 28, 2025 at 2:35 AM
We first define Bayesian predictors for ICL settings that involve learning a finite mixture of tasks:

🔴 Memorizing (M): discrete prior on seen tasks.
🔵 Generalizing (G): continuous prior matching the true task distribution.

These match known strategies from prior work!

2/
June 28, 2025 at 2:35 AM
🚨New paper! We know models learn distinct in-context learning strategies, but *why*? Why generalize instead of memorize to lower loss? And why is generalization transient?

Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵

1/
June 28, 2025 at 2:35 AM
This work serves as a proof of concept for scaling up analysis of verbal reports, realizing a vision for automated protocol analysis first proposed by Waterman & Newell back in 1971. We hope this inspires new research on human reasoning using the think-aloud method! (7/8)
June 25, 2025 at 5:00 AM
We also found that human search is highly structured. Using a Gini index to measure consistency, we saw that human reasoning clusters around specific multi-step sequences much more than a random agent, revealing shared, underlying strategies. (6/8)
June 25, 2025 at 5:00 AM
Our large-scale dataset reveals patterns in human reasoning, e.g., Participants use addition and multiplication far more often than division. In problems requiring division, 47% of participants who failed never even tried the operation, suggesting failures of consideration are a major hurdle. (5/8)
June 25, 2025 at 5:00 AM
So, how well does the automated coding work? While human-human agreement remains the gold standard, top LLMs show moderate inter-rater reliability with humans and capture key graph elements.

This provides a viable trade-off: we process nearly 5,000 trials in <2 hours vs. ~800 hours manually! (4/8)
June 25, 2025 at 5:00 AM
We ran the largest think-aloud study ever (to our knowledge): 640 participants thought aloud while playing “Game of 24,” reaching 24 with four numbers and arithmetic operations.

Our pipeline transcribes audio and uses an LLM to build search graphs representing participants' reasoning paths. (3/8)
June 25, 2025 at 5:00 AM
The think-aloud method, where participants voice their thoughts as they solve a task, was used in classic work by Newell & Simon to develop early computational models of cognition.

We sought to revitalize their approach, combining rich process-level data with scale offered by current tools. (2/8)
June 25, 2025 at 5:00 AM
Excited to share a new CogSci paper co-led with @benpry.bsky.social!

Once a cornerstone for studying human reasoning, the think-aloud method declined in popularity as manual coding limited its scale. We introduce a method to automate analysis of verbal reports and scale think-aloud studies. (1/8)🧵
June 25, 2025 at 5:00 AM