📍 Trento, Italy
🧵 #identifiability, #shortcuts, #interpretability
For more info, check the full paper 👇
arxiv.org/abs/2410.235...
For more info, check the full paper 👇
arxiv.org/abs/2410.235...
A mathematical proof that, under suitable conditions, linear properties hold for either all or none of the equivalent models with same next-token distribution 😎
Exciting open questions on empirical findings remain🤔 - check Section 6 (Discussion) in the paper!
8/9
A mathematical proof that, under suitable conditions, linear properties hold for either all or none of the equivalent models with same next-token distribution 😎
Exciting open questions on empirical findings remain🤔 - check Section 6 (Discussion) in the paper!
8/9
🔥 Under mild assumptions, relational linear properties are shared!
⚠️ Parallel vectors may not be shared (they are under diversity)!
7/9
🔥 Under mild assumptions, relational linear properties are shared!
⚠️ Parallel vectors may not be shared (they are under diversity)!
7/9
💡They arise when the LLM can predict next-tokens for textual queries like: "What is the written language?" for many context strings!
6/9
💡They arise when the LLM can predict next-tokens for textual queries like: "What is the written language?" for many context strings!
6/9
💡Parallel vectors arise from same log-ratios of next-token probs
E.g. same ratio for "easy"/"easiest" and "strong"/"strongest" in all contexts => parallel vecs
5/9
💡Parallel vectors arise from same log-ratios of next-token probs
E.g. same ratio for "easy"/"easiest" and "strong"/"strongest" in all contexts => parallel vecs
5/9
‼️Outside that subspace, representations can differ a lot!
4/9
‼️Outside that subspace, representations can differ a lot!
4/9
For the first time, we relate models with different repr. dimensions & find that repr.s of LLMs with same distribution are related by an “extended linear equivalence”!
3/9
For the first time, we relate models with different repr. dimensions & find that repr.s of LLMs with same distribution are related by an “extended linear equivalence”!
3/9
1⃣ An identifiability result for LLMs
2⃣A 𝙧𝙚𝙡𝙖𝙩𝙞𝙤𝙣𝙖𝙡 reformulation of linear properties
3⃣ A proof of what properties are 𝙘𝙤𝙫𝙖𝙧𝙞𝙖𝙣𝙩 (~to Physics, cf. Villar et al. (2023)): hold for all or none of the LLMs with same next-token distribution
2/9
1⃣ An identifiability result for LLMs
2⃣A 𝙧𝙚𝙡𝙖𝙩𝙞𝙤𝙣𝙖𝙡 reformulation of linear properties
3⃣ A proof of what properties are 𝙘𝙤𝙫𝙖𝙧𝙞𝙖𝙣𝙩 (~to Physics, cf. Villar et al. (2023)): hold for all or none of the LLMs with same next-token distribution
2/9
openreview.net/forum?id=XCm...
openreview.net/forum?id=XCm...