📍 Trento, Italy
🧵 #identifiability, #shortcuts, #interpretability
🔥 Under mild assumptions, relational linear properties are shared!
⚠️ Parallel vectors may not be shared (they are under diversity)!
7/9
🔥 Under mild assumptions, relational linear properties are shared!
⚠️ Parallel vectors may not be shared (they are under diversity)!
7/9
💡They arise when the LLM can predict next-tokens for textual queries like: "What is the written language?" for many context strings!
6/9
💡They arise when the LLM can predict next-tokens for textual queries like: "What is the written language?" for many context strings!
6/9
💡Parallel vectors arise from same log-ratios of next-token probs
E.g. same ratio for "easy"/"easiest" and "strong"/"strongest" in all contexts => parallel vecs
5/9
💡Parallel vectors arise from same log-ratios of next-token probs
E.g. same ratio for "easy"/"easiest" and "strong"/"strongest" in all contexts => parallel vecs
5/9
‼️Outside that subspace, representations can differ a lot!
4/9
‼️Outside that subspace, representations can differ a lot!
4/9
For the first time, we relate models with different repr. dimensions & find that repr.s of LLMs with same distribution are related by an “extended linear equivalence”!
3/9
For the first time, we relate models with different repr. dimensions & find that repr.s of LLMs with same distribution are related by an “extended linear equivalence”!
3/9
We explore this question through the lens of 𝗶𝗱𝗲𝗻𝘁𝗶𝗳𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆:
“All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling”
Published at #AISTATS2025🌴
1/9
We explore this question through the lens of 𝗶𝗱𝗲𝗻𝘁𝗶𝗳𝗶𝗮𝗯𝗶𝗹𝗶𝘁𝘆:
“All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling”
Published at #AISTATS2025🌴
1/9
Time to prepare for Thailand 🪷🏖️🌴🐒
Huge thanks to my coauthors
Luigi Gresele, Sebastian Weichwald, and @seblachap.bsky.social for all the joint effort!
More details soon 👇
arxiv.org/abs/2410.235...
Time to prepare for Thailand 🪷🏖️🌴🐒
Huge thanks to my coauthors
Luigi Gresele, Sebastian Weichwald, and @seblachap.bsky.social for all the joint effort!
More details soon 👇
arxiv.org/abs/2410.235...