neelnanda.bsky.social
@neelnanda.bsky.social
The call for papers for the NeurIPS Mechanistic Interpretability Workshop is open!

Max 4 or 9 pages, due 22 Aug, NeurIPS submissions welcome

We welcome any works that further our ability to use the internals of a model to better understand it

Details: mechinterpworkshop com
July 13, 2025 at 1:00 PM
I've been really feeling how much the general public is concerned about AI risk...

In a *weird* amount of recent interactions with normal people (eg my hairdresser) when I say I do AI research (*not* safety), they ask if AI will take over

Alas, I have no reassurances to offer
June 1, 2025 at 6:48 PM
Blog post: I often give advice, to mentees, friends, etc. This is hard! I'm often missing context

My favourite approach is Socratic persuasion: guiding them through my case via questions. If I'm wrong there's soon a surprising answer!

I can be opinionated *and* truth seeking
May 26, 2025 at 6:37 PM
There are many moving pieces when turning a project into a machine learning conference paper, and best practices/nuances no one writes up. I made a comprehensive paper writing checklist for my mentees and am sharing a public version below - hopefully it's useful, esp for NeurIPS!
May 11, 2025 at 10:47 PM
Post 3: What is research taste?

This mystical notion separates new and experienced researchers. It's real and important. But what is it and how to learn it?

I break down taste as the mix of intuition/models behind good open-ended decisions and share tricks to speed up learning
x.com/NeelNanda5/...
May 2, 2025 at 1:00 PM
I'm very impressed with the Sentinel newsletter: by far the best aggregator of global news I've found

Expert forecasters filter for the events that actually matter (not just noise), and forecast how likely this is to affect eg war, pandemics, frontier AI etc

Highly recommended!
April 29, 2025 at 1:00 PM
This was great to supervise - the kind of basic science of SAEs work that's most promising IMO! Find a fundamental issue with SAEs, fixing it with an adjustment (here a different loss), and rigorously measuring how much it's fixed. I recommend using Matryoshka where possible.
x.com/BartBussman...
April 2, 2025 at 1:01 PM
New post: A weird phenomenon: because I have evidence of doing good interpretability research, people assume I have good big picture takes about how it matters for AGI X-risk. I try, but these are different skillsets! Before deferring, check for *relevant* evidence of skill.
March 22, 2025 at 12:00 PM
DeepMind AGI Safety is hiring! I think we're among the best places in the world to do technical work to make AGI safer. I'm keen to meet some great future colleagues! Due Feb 28

One role is applied interpretability: a new subteam of my team using interp for safety in production
x.com/rohinmshah/...
February 18, 2025 at 2:00 PM
I'm really excited about our ICLR paper!

It's clarified my thinking on the flaws with current SAEs - the fact that we must choose the SAE size means we can't be finding 'true' concepts.

SAEs are an imperfect lens, that represent concepts at varying levels of granularity
x.com/BartBussman...
February 11, 2025 at 6:22 PM
As part of opening my new round of MATS applications, I took this as an excuse to write up which research discussions I'm currently just excited about, and recent updates I've made - I thought this might be of more general interest!
x.com/NeelNanda5/...
February 8, 2025 at 1:57 PM
I'll be speaking at the IAESAI conference in Paris today and tomorrow - let me know if you're around and want to chat!
February 6, 2025 at 8:28 AM
LLM agents will be a very big deal, bringing many weird new forms of reward hacking and other subtle failures

This great GDM safety paper shows that myopically optimising for plans that an overseer approves of, rather than outcomes, reduces these issues while performing well!
x.com/davlindner/...
January 23, 2025 at 4:36 PM
Reposted
Starting to prepare yourself to submit to ICML? Here are my tips on how to write well for an ML research audience. sebastianfarquhar.com/on-research/...
How to Write ML Papers
This doc is aimed at students learning to write ML papers as well as more experienced writers. It isn’t about how to do the research itself, but about how to present it in a way that makes it impactfu...
sebastianfarquhar.com
November 18, 2024 at 8:06 PM