Lee Sharkey
leesharkey.bsky.social
Lee Sharkey
@leesharkey.bsky.social
Scruting matrices @ Apollo Research
New interpretability paper from Apollo Research!

🟢Attribution-based Parameter Decomposition 🟢

It's a new way to decompose neural network parameters directly into mechanistic components.

It overcomes many of the issues with SAEs! 🧵
January 27, 2025 at 7:29 PM
Reposted by Lee Sharkey
To my surprise, we find the opposite of what I thought when we started this project:

The approach to reasoning LLMs use looks unlike retrieval, and more like a generalisable strategy synthesising procedural knowledge from many documents doing a similar form of reasoning.
November 20, 2024 at 4:35 PM