Emanuele Marconato
banner
ema-ridopoco.bsky.social
Emanuele Marconato
@ema-ridopoco.bsky.social
Post-doc @ University of Trento. I did my PhD @ University of Trento and the University of Pisa. I like #concepts, #symbols, and #representations, but I still don't know what they are.

📍 Trento, Italy
🧵 #identifiability, #shortcuts, #interpretability
Joint work with Sebastian Weichwald, Sebastien Lachapelle, and Luigi Gresele 🙏

For more info, check the full paper 👇

arxiv.org/abs/2410.235...
https://arxiv.org/abs/2410.23501
t.co
June 17, 2025 at 3:12 PM
🧵Summary

A mathematical proof that, under suitable conditions, linear properties hold for either all or none of the equivalent models with same next-token distribution 😎

Exciting open questions on empirical findings remain🤔 - check Section 6 (Discussion) in the paper!

8/9
June 17, 2025 at 3:12 PM
3⃣ We demonstrate what linear properties are shared by all or none LLMs.

🔥 Under mild assumptions, relational linear properties are shared!

⚠️ Parallel vectors may not be shared (they are under diversity)!

7/9
June 17, 2025 at 3:12 PM
We also describe other linear properties: linear subspaces, probing, steering, based on relational strings (Paccanaro and Hinton, 2001).

💡They arise when the LLM can predict next-tokens for textual queries like: "What is the written language?" for many context strings!

6/9
June 17, 2025 at 3:12 PM
2⃣ We reformulate linear properties of LLMs based on textual strings, depending on how LLMs predict next tokens

💡Parallel vectors arise from same log-ratios of next-token probs

E.g. same ratio for "easy"/"easiest" and "strong"/"strongest" in all contexts => parallel vecs

5/9
June 17, 2025 at 3:12 PM
💡The extended linear equivalence underlies that two models' representations are linearly related, but in a subspace

‼️Outside that subspace, representations can differ a lot!

4/9
June 17, 2025 at 3:12 PM
1⃣We extend the results by Khemakem et al. (2020), Roeder et al. (2021), removing a diversity assumption.

For the first time, we relate models with different repr. dimensions & find that repr.s of LLMs with same distribution are related by an “extended linear equivalence”!

3/9
June 17, 2025 at 3:12 PM
Contributions:

1⃣ An identifiability result for LLMs
2⃣A 𝙧𝙚𝙡𝙖𝙩𝙞𝙤𝙣𝙖𝙡 reformulation of linear properties
3⃣ A proof of what properties are 𝙘𝙤𝙫𝙖𝙧𝙞𝙖𝙣𝙩 (~to Physics, cf. Villar et al. (2023)): hold for all or none of the LLMs with same next-token distribution

2/9
June 17, 2025 at 3:12 PM
@yanai.bsky.social this is very interesting!! FYI, we studied the ubiquity, rather than emergence, of linear relational properties here:

openreview.net/forum?id=XCm...
All or None: Identifiable Linear Properties of Next-Token...
We analyze identifiability as a possible explanation for the ubiquity of linear properties across language models, such as the vector difference between the representations of “easy” and “easiest”...
openreview.net
May 3, 2025 at 2:39 AM
Ask you to please add me :)
December 4, 2024 at 8:24 AM
Then I agree 😄
December 2, 2024 at 7:54 PM
What is the precise definition of feature?
December 2, 2024 at 7:33 PM
I would like to ask for some back stabs to reviewer 2 🤬
November 28, 2024 at 6:41 PM
I know @looselycorrect.bsky.social well enough eheh
November 21, 2024 at 6:41 PM