Studying in-context learning and reasoning in humans and machines
Prev. @UofT CS & Psych
16/16
16/16
@ekdeepl.bsky.social @corefpark.bsky.social @gautamreddy.bsky.social @hidenori8tanaka.bsky.social @noahdgoodman.bsky.social
See the paper for full results and discussion! And watch for updates! We are working on explaining and unifying more ICL phenomena! 15/
@ekdeepl.bsky.social @corefpark.bsky.social @gautamreddy.bsky.social @hidenori8tanaka.bsky.social @noahdgoodman.bsky.social
See the paper for full results and discussion! And watch for updates! We are working on explaining and unifying more ICL phenomena! 15/
3) A top-down, normative perspective offers a powerful, predictive approach for understanding neural networks, complementing bottom-up mechanistic work.
14/
3) A top-down, normative perspective offers a powerful, predictive approach for understanding neural networks, complementing bottom-up mechanistic work.
14/
2) A tradeoff between *loss and complexity* is fundamental to understanding model training dynamics, and gives a unifying explanation for ICL phenomena of transient generalization and task-diversity effects!
13/
2) A tradeoff between *loss and complexity* is fundamental to understanding model training dynamics, and gives a unifying explanation for ICL phenomena of transient generalization and task-diversity effects!
13/
1) Is ICL Bayes-optimal? We argue the better question is *under what assumptions*. Cautiously, we conclude that ICL can be seen as approx. Bayesian under a simplicity bias and sublinear sample efficiency (though see our appendix for an interesting deviation!)
12/
1) Is ICL Bayes-optimal? We argue the better question is *under what assumptions*. Cautiously, we conclude that ICL can be seen as approx. Bayesian under a simplicity bias and sublinear sample efficiency (though see our appendix for an interesting deviation!)
12/
11/
11/
10/
10/
🔹**Sub-linear** sample efficiency → sigmoidal transition from generalization to memorization
🔹**Rapid** behavior change near the M–G crossover boundary
🔹**Superlinear** scaling of time to transience as data diversity increases
9/
🔹**Sub-linear** sample efficiency → sigmoidal transition from generalization to memorization
🔹**Rapid** behavior change near the M–G crossover boundary
🔹**Superlinear** scaling of time to transience as data diversity increases
9/
🔵Early: A simplicity bias (prior) favors a less complex strategy (G)
🔴Late: reducing loss (likelihood) favors a better-fitting, but more complex strategy (M)
8/
🔵Early: A simplicity bias (prior) favors a less complex strategy (G)
🔴Late: reducing loss (likelihood) favors a better-fitting, but more complex strategy (M)
8/
We now have a predictive model of task diversity effects and transience!
7/
We now have a predictive model of task diversity effects and transience!
7/