Lightnews — Scholar-powered news

Tal Haklay

@talhaklay.bsky.social

Our paper "Position-Aware Automatic Circuit Discovery" got accepted to ACL! 🎉

Huge thanks to my collaborators🙏
@hadasorgad.bsky.social
@davidbau.bsky.social
@amuuueller.bsky.social
@boknilev.bsky.social

See you in Vienna! 🇦🇹 #ACL2025 @aclmeeting.bsky.social

May 22, 2025 at 8:11 AM

Tal Haklay

@talhaklay.bsky.social

We knew many of you wanted to submit to our Actionable Interpretability workshop, but we didn’t expect to crash Overleaf! 😏🍃

Only 5 days left ⏰!
Got a paper accepted to ICML that fits our theme?
Submit it to our conference track!
👉 @actinterp.bsky.social

May 14, 2025 at 1:04 PM

Tal Haklay

@talhaklay.bsky.social

April 7, 2025 at 1:53 PM

Tal Haklay

@talhaklay.bsky.social

April 7, 2025 at 1:52 PM

Tal Haklay

@talhaklay.bsky.social

🚨 Call for Papers is Out!

The First Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at ICML 2025 in Vancouver!

📅 Submission Deadline: May 9
Follow us >> @ActInterp

🧠Topics of interest include: 👇

April 7, 2025 at 1:51 PM

Tal Haklay

@talhaklay.bsky.social

12/13 We evaluate our automatic pipeline across three datasets and two models, demonstrating that:

1️⃣ Our pipeline discovers circuits with a better tradeoff between size and faithfulness compared to EAP.
2️⃣ Our pipeline produces results comparable to those obtained when human experts define a schema.

March 6, 2025 at 10:15 PM

Tal Haklay

@talhaklay.bsky.social

10/13 After defining a schema, we construct an abstract computation graph where each span type corresponds to a single token position. We then map attribution scores from example-specific computation graphs to the abstract graph and identify circuits within it.

March 6, 2025 at 10:15 PM

Tal Haklay

@talhaklay.bsky.social

9/13 To address this problem, we introduce the concept of a 𝙙𝙖𝙩𝙖𝙨𝙚𝙩 𝙨𝙘𝙝𝙚𝙢𝙖, which defines token spans with similar semantics across examples in the dataset.

March 6, 2025 at 10:15 PM

Tal Haklay

@talhaklay.bsky.social

7/13 First improvement :
We introduce 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗘𝗱𝗴𝗲 𝗔𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 𝗣𝗮𝘁𝗰𝗵𝗶𝗻𝗴 (𝗣𝗘𝗔𝗣)
—an extension of EAP that allows us to discover circuits that differentiate between token positions. The key advancement? Our approach uncovers "attention edges", revealing dependencies missed by previous methods.

March 6, 2025 at 10:15 PM

Tal Haklay

@talhaklay.bsky.social

6/13 The Problem:
Automatic circuit discovery methods like Edge Attribution Patching (EAP) and EAP-IP implicitly assume that circuits are position-invariant—they do not differentiate between components at different token positions.

As a result, the circuit may include irrelevant components.

March 6, 2025 at 10:15 PM

Tal Haklay

@talhaklay.bsky.social

4/13 Early circuit discovery techniques relied on manual causal analysis to identify circuits.

Here’s an example of a well-studied circuit in the IOI task by Wang et al. Notice how different components play crucial roles at different token positions—this is expected!

March 6, 2025 at 10:15 PM

Tal Haklay

@talhaklay.bsky.social

1/13 LLM circuits tell us where the computation happens inside the model—but the computation varies by token position, a key detail often ignored!
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇

March 6, 2025 at 10:15 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news