aaditya6284.bsky.social
@aaditya6284.bsky.social
Finally, we carry forward the intuitions from the minimal mathematical model to find a setting where ICL is emergent and persistent. This intervention holds true at larger scales as well, demonstrating the benefits of the improved mechanistic understanding! (9/11)
March 11, 2025 at 7:13 AM
We propose a minimal model of the joint competitive-cooperative ("coopetitive") interactions, which captures the key transience phenomena. We were pleasantly surprised when the model even captured weird non-monotonicities in the formation of the slower mechanism! (8/11)
March 11, 2025 at 7:13 AM
But why does ICL emerge in the first place, if only to give way to CIWL? The ICL solution lies close to the path to the CIWL strategy. Since ICL also helps with the task (and CIWL is "slow), it emerges on the way to the CIWL strategy due to the cooperative interactions. (7/11)
March 11, 2025 at 7:13 AM
Specifically, we find that Layer 2 circuits (the canonical "induction head") are largely conserved (after an initial phase change), while Layer 1 circuits switch from previous token to attending to self, driving the switch from ICL to CIWL. (6/11)
March 11, 2025 at 7:13 AM
This strategy is implemented through attention heads serving as skip-trigram copiers (e.g., … [label] … [query] -> [label]). While seemingly distinct from the induction circuits that lead to ICL, we find remarkably shared substructure! (5/11)
March 11, 2025 at 7:13 AM
Like prior work, we train on sequences of exemplar-label pairs, which permit in-context and in-weights strategies. We test for these strategies using out-of-distribution evaluation sequences, recovering the classic transience phenomenon (blue). (3/11)
March 11, 2025 at 7:13 AM
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?

Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns.

Read on 🔎⏬
March 11, 2025 at 7:13 AM