Huge thanks to my collaborators🙏
@hadasorgad.bsky.social
@davidbau.bsky.social
@amuuueller.bsky.social
@boknilev.bsky.social
See you in Vienna! 🇦🇹 #ACL2025 @aclmeeting.bsky.social
Huge thanks to my collaborators🙏
@hadasorgad.bsky.social
@davidbau.bsky.social
@amuuueller.bsky.social
@boknilev.bsky.social
See you in Vienna! 🇦🇹 #ACL2025 @aclmeeting.bsky.social
Only 5 days left ⏰!
Got a paper accepted to ICML that fits our theme?
Submit it to our conference track!
👉 @actinterp.bsky.social
Only 5 days left ⏰!
Got a paper accepted to ICML that fits our theme?
Submit it to our conference track!
👉 @actinterp.bsky.social
The First Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at ICML 2025 in Vancouver!
📅 Submission Deadline: May 9
Follow us >> @ActInterp
🧠Topics of interest include: 👇
The First Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at ICML 2025 in Vancouver!
📅 Submission Deadline: May 9
Follow us >> @ActInterp
🧠Topics of interest include: 👇
1️⃣ Our pipeline discovers circuits with a better tradeoff between size and faithfulness compared to EAP.
2️⃣ Our pipeline produces results comparable to those obtained when human experts define a schema.
1️⃣ Our pipeline discovers circuits with a better tradeoff between size and faithfulness compared to EAP.
2️⃣ Our pipeline produces results comparable to those obtained when human experts define a schema.
We introduce 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗘𝗱𝗴𝗲 𝗔𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 𝗣𝗮𝘁𝗰𝗵𝗶𝗻𝗴 (𝗣𝗘𝗔𝗣)
—an extension of EAP that allows us to discover circuits that differentiate between token positions. The key advancement? Our approach uncovers "attention edges", revealing dependencies missed by previous methods.
We introduce 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗘𝗱𝗴𝗲 𝗔𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 𝗣𝗮𝘁𝗰𝗵𝗶𝗻𝗴 (𝗣𝗘𝗔𝗣)
—an extension of EAP that allows us to discover circuits that differentiate between token positions. The key advancement? Our approach uncovers "attention edges", revealing dependencies missed by previous methods.
Automatic circuit discovery methods like Edge Attribution Patching (EAP) and EAP-IP implicitly assume that circuits are position-invariant—they do not differentiate between components at different token positions.
As a result, the circuit may include irrelevant components.
Automatic circuit discovery methods like Edge Attribution Patching (EAP) and EAP-IP implicitly assume that circuits are position-invariant—they do not differentiate between components at different token positions.
As a result, the circuit may include irrelevant components.
Here’s an example of a well-studied circuit in the IOI task by Wang et al. Notice how different components play crucial roles at different token positions—this is expected!
Here’s an example of a well-studied circuit in the IOI task by Wang et al. Notice how different components play crucial roles at different token positions—this is expected!
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇
We propose a method to automatically find position-aware circuits, improving faithfulness while keeping circuits compact. 🧵👇