This has been conceptually challenging for the SAE paradigm to overcome. (Crosscoder features aren't the computations themselves, but are more akin to the results of the computations).
This has been conceptually challenging for the SAE paradigm to overcome. (Crosscoder features aren't the computations themselves, but are more akin to the results of the computations).
Each parameter component learns to implement a different basic computation.
Each parameter component learns to implement a different basic computation.
How to identify which parts are needed?
Using attribution methods.
Hence the name Attribution-based Parameter Decomposition!
How to identify which parts are needed?
Using attribution methods.
Hence the name Attribution-based Parameter Decomposition!
🟢Attribution-based Parameter Decomposition 🟢
It's a new way to decompose neural network parameters directly into mechanistic components.
It overcomes many of the issues with SAEs! 🧵
🟢Attribution-based Parameter Decomposition 🟢
It's a new way to decompose neural network parameters directly into mechanistic components.
It overcomes many of the issues with SAEs! 🧵