Lightnews — Scholar-powered news

Michael Hanna

@michaelwhanna.bsky.social

I'll be presenting this in person at NAACL, tomorrow at 11am in Ballroom C! Come on by - I'd love to chat with folks about this and all things interp / cog sci!

Michael Hanna @michaelwhanna.bsky.social · Dec 19

Sentences are partially understood before they're fully read. How do LMs incrementally interpret their inputs?

In a new paper, @amuuueller.bsky.social and I use mech interp tools to study how LMs process structurally ambiguous sentences. We show LMs rely on both syntactic & spurious features! 1/10

April 30, 2025 at 1:46 PM

Reposted by Michael Hanna

Aaron Mueller

@amuuueller.bsky.social

Lots of progress in mech interp (MI) lately! But how can we measure when new mech interp methods yield real improvements over prior work?

We propose 😎 𝗠𝗜𝗕: a 𝗠echanistic 𝗜nterpretability 𝗕enchmark!

April 23, 2025 at 6:15 PM

Reposted by Michael Hanna

Aaron Mueller

@amuuueller.bsky.social

(NAACL) When reading a sentence, humans predict what's likely to come next. When the ending is unexpected, this leads to garden-path effects: e.g., "The child bought an ice cream smiled."

Do LLMs show similar mechanisms? @michaelwhanna.bsky.social and I investigate: arxiv.org/abs/2412.05353

Incremental Sentence Processing Mechanisms in Autoregressive Transformer Language Models

Autoregressive transformer language models (LMs) possess strong syntactic abilities, often successfully handling phenomena from agreement to NPI licensing. However, the features they use to incrementa...

arxiv.org

March 11, 2025 at 2:30 PM

Reposted by Michael Hanna

Adi Simhi

@adisimhi.bsky.social

🚨New arXiv preprint!🚨
LLMs can hallucinate - but did you know they can do so with high certainty even when they know the correct answer? 🤯
We find those hallucinations in our latest work with @itay-itzhak.bsky.social, @fbarez.bsky.social, @gabistanovsky.bsky.social and Yonatan Belinkov

February 19, 2025 at 3:50 PM

Michael Hanna

@michaelwhanna.bsky.social

Excited to say that this was accepted to NAACL—looking forward to presenting it in Albuquerque!

Michael Hanna @michaelwhanna.bsky.social · Dec 19

Sentences are partially understood before they're fully read. How do LMs incrementally interpret their inputs?

In a new paper, @amuuueller.bsky.social and I use mech interp tools to study how LMs process structurally ambiguous sentences. We show LMs rely on both syntactic & spurious features! 1/10

January 24, 2025 at 5:03 PM

Reposted by Michael Hanna

Dota Tianai Dong

@dotadotadota.bsky.social

🚨Call for Papers🚨
The Re-Align Workshop is coming back to #ICLR2025

Our CfP is up! Come share your representational alignment work at our interdisciplinary workshop at
@iclr-conf.bsky.social

Deadline is 11:59 pm AOE on Feb 3rd

representational-alignment.github.io

January 16, 2025 at 7:04 PM

Michael Hanna

@michaelwhanna.bsky.social

Sentences are partially understood before they're fully read. How do LMs incrementally interpret their inputs?

In a new paper, @amuuueller.bsky.social and I use mech interp tools to study how LMs process structurally ambiguous sentences. We show LMs rely on both syntactic & spurious features! 1/10

December 19, 2024 at 1:40 PM

Reposted by Michael Hanna

Ana Lučić

@a-lucic.bsky.social

🚨 PhD position alert! 🚨

I'm hiring a fully funded PhD student to work on mechanistic interpretability at @uva-amsterdam.bsky.social. If you're interested in reverse engineering modern deep learning architectures, please apply: vacatures.uva.nl/UvA/job/PhD-...

PhD Position in Mechanistic Interpretability

vacatures.uva.nl

December 2, 2024 at 7:36 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news