Lightnews — Scholar-powered news

Lee Sharkey

@leesharkey.bsky.social

630 followers 110 following 12 posts

Scruting matrices @ Apollo Research

Posts Replies Media Videos

Lee Sharkey

@leesharkey.bsky.social

And the method lets us identify computations that are spread across multiple layers.

This has been conceptually challenging for the SAE paradigm to overcome. (Crosscoder features aren't the computations themselves, but are more akin to the results of the computations).

January 27, 2025 at 7:29 PM

Lee Sharkey

@leesharkey.bsky.social

Our method lets us identify fundamental computations (or 'circuits') in a toy model of 'Compressed computation', which is a phenomenon similar to 'Computation in superposition'.

Each parameter component learns to implement a different basic computation.

January 27, 2025 at 7:29 PM

Lee Sharkey

@leesharkey.bsky.social

The key idea: Neural networks only need certain parts of their parameters on each forward pass. The rest can be thrown away (on that forward pass).

How to identify which parts are needed?

Using attribution methods.

Hence the name Attribution-based Parameter Decomposition!

January 27, 2025 at 7:29 PM

Lee Sharkey

@leesharkey.bsky.social

For example, with anthropic's Toy Model of Superposition, we can decompose the parameters directly into mechanisms that are used by individual features.

January 27, 2025 at 7:29 PM

Lee Sharkey

@leesharkey.bsky.social

New interpretability paper from Apollo Research!

🟢Attribution-based Parameter Decomposition 🟢

It's a new way to decompose neural network parameters directly into mechanistic components.

It overcomes many of the issues with SAEs! 🧵

January 27, 2025 at 7:29 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news