Studying in-context learning and reasoning in humans and machines
Prev. @UofT CS & Psych
2) A tradeoff between *loss and complexity* is fundamental to understanding model training dynamics, and gives a unifying explanation for ICL phenomena of transient generalization and task-diversity effects!
13/
2) A tradeoff between *loss and complexity* is fundamental to understanding model training dynamics, and gives a unifying explanation for ICL phenomena of transient generalization and task-diversity effects!
13/
1) Is ICL Bayes-optimal? We argue the better question is *under what assumptions*. Cautiously, we conclude that ICL can be seen as approx. Bayesian under a simplicity bias and sublinear sample efficiency (though see our appendix for an interesting deviation!)
12/
1) Is ICL Bayes-optimal? We argue the better question is *under what assumptions*. Cautiously, we conclude that ICL can be seen as approx. Bayesian under a simplicity bias and sublinear sample efficiency (though see our appendix for an interesting deviation!)
12/
11/
11/
10/
10/
🔹**Sub-linear** sample efficiency → sigmoidal transition from generalization to memorization
🔹**Rapid** behavior change near the M–G crossover boundary
🔹**Superlinear** scaling of time to transience as data diversity increases
9/
🔹**Sub-linear** sample efficiency → sigmoidal transition from generalization to memorization
🔹**Rapid** behavior change near the M–G crossover boundary
🔹**Superlinear** scaling of time to transience as data diversity increases
9/
🔵Early: A simplicity bias (prior) favors a less complex strategy (G)
🔴Late: reducing loss (likelihood) favors a better-fitting, but more complex strategy (M)
8/
🔵Early: A simplicity bias (prior) favors a less complex strategy (G)
🔴Late: reducing loss (likelihood) favors a better-fitting, but more complex strategy (M)
8/
We now have a predictive model of task diversity effects and transience!
7/
We now have a predictive model of task diversity effects and transience!
7/
6/
6/
🔹Pretraining = updating posterior probability (preference) for strategies
🔹Inference = posterior-weighted average of strategies
5/
🔹Pretraining = updating posterior probability (preference) for strategies
🔹Inference = posterior-weighted average of strategies
5/
Back to our question:*Why* do models switch ICL strategies?! Given M / G are *Bayes-optimal* for train / true distributions, we invoke the approach of rational analysis to answer this!
4/
Back to our question:*Why* do models switch ICL strategies?! Given M / G are *Bayes-optimal* for train / true distributions, we invoke the approach of rational analysis to answer this!
4/
3/
3/
🔴 Memorizing (M): discrete prior on seen tasks.
🔵 Generalizing (G): continuous prior matching the true task distribution.
These match known strategies from prior work!
2/
🔴 Memorizing (M): discrete prior on seen tasks.
🔵 Generalizing (G): continuous prior matching the true task distribution.
These match known strategies from prior work!
2/
Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵
1/
Our work explains this & *predicts Transformer behavior throughout training* without its weights! 🧵
1/
This provides a viable trade-off: we process nearly 5,000 trials in <2 hours vs. ~800 hours manually! (4/8)
This provides a viable trade-off: we process nearly 5,000 trials in <2 hours vs. ~800 hours manually! (4/8)
Our pipeline transcribes audio and uses an LLM to build search graphs representing participants' reasoning paths. (3/8)
Our pipeline transcribes audio and uses an LLM to build search graphs representing participants' reasoning paths. (3/8)
We sought to revitalize their approach, combining rich process-level data with scale offered by current tools. (2/8)
We sought to revitalize their approach, combining rich process-level data with scale offered by current tools. (2/8)
Once a cornerstone for studying human reasoning, the think-aloud method declined in popularity as manual coding limited its scale. We introduce a method to automate analysis of verbal reports and scale think-aloud studies. (1/8)🧵
Once a cornerstone for studying human reasoning, the think-aloud method declined in popularity as manual coding limited its scale. We introduce a method to automate analysis of verbal reports and scale think-aloud studies. (1/8)🧵