Lightnews — Scholar-powered news

Gabriele Sarti

@gsarti.com

It was an honor to be part of this awesome project! Interpreto is a great up-and-coming tool for concept-based interpretability analyses of NLP models, check it out!

Antonin Poché @antoninpoche.bsky.social · 1d

🔥I am super excited for the official release of an open-source library we've been working on for about a year!

🪄interpreto is an interpretability toolbox for HF language models🤗. In both generation and classification!

Why do you need it, and for what?

1/8 (links at the end)

January 21, 2026 at 4:20 AM

Reposted by Gabriele Sarti

NDIF Team

@ndif-team.bsky.social

New year, new YouTube videos! We are resuming our regular interpretability seminar posts, with a fantastic talk by Deepti Ghadiyaram on interpreting diffusion models.

Watch the video: youtu.be/4eqvABPX5rA

Interpreting and Leveraging Diffusion Representations with Deepti Ghadiyaram

Deepti Ghadiyaram is an Assistant Professor at Boston University in the Department of Computer Science, with affiliated appointments in Electrical and Comput...

www.youtube.com

January 15, 2026 at 9:20 PM

Reposted by Gabriele Sarti

Naomi Saphra

@nsaphra.bsky.social

All interpretability research is either philosophy (affectionate) or stamp collecting (derogatory)

January 11, 2026 at 8:47 PM

Gabriele Sarti

@gsarti.com

The NDIF ecosystem is growing! 🚀 nnterp will bridge the gap between fine-grained fiddling with model internals (nnsight) and low-code access to bespoke viz (workbench). Excited to work with @butanium.bsky.social and the @ndif-team.bsky.social to make it a standard in interp research!

NDIF Team @ndif-team.bsky.social · 12d

nnterp by @butanium.bsky.social is now part of the NDIF ecosystem! nnterp standardizes transformer naming conventions, includes built-in best practices for common interventions, and is perfectly compatible with original HF model implementations.

Learn more: ndif-team.github.io/nnterp/

January 9, 2026 at 11:14 PM

Gabriele Sarti

@gsarti.com

Happy to announce I will be mentoring a SPAR project this Spring! ✨Check out the programme and apply by Jan 14th to work with me on understanding and mitigating implicit personalization in LLMs, i.e. how models form hidden beliefs about users that shape their responses.

January 9, 2026 at 2:09 PM

Gabriele Sarti

@gsarti.com

This reads like a modern-day satirical adaptation of "The Lifecycle of Software Object" by Ted Chiang!

David Bau @davidbau.bsky.social · 16d

My vibe-coded Mandelbrot viewer is 40x faster now! New GPU synchronization tricks go outside the design intent of WebGPU specs. But the real story: Claude tells me what happens in the AGI break room.

What superhuman AGIs say when the boss is not around:
davidbau.com/archives/202...

January 6, 2026 at 1:14 AM

Gabriele Sarti

@gsarti.com

📣 I'm starting a postdoc at Northeastern University, where I will work on open-source NN interpretability with @davidbau.bsky.social and the @ndif-team.bsky.social.

In 2026, we'll grow the NDIF ecosystem and democratize access to interpretability methods for academics and domain experts! 🚀

January 4, 2026 at 6:42 PM

Gabriele Sarti

@gsarti.com

Our work on contrastive SAE steering for personalizing literary machine translation was accepted to EACL main! 🎉 Check it out! ⬇️

Daniel Scalena @danielsc4.it · May 23

📢 New paper: Applied interpretability 🤝 MT personalization!

We steer LLM generations to mimic human translator styles on literary novels in 7 languages. 📚

SAE steering can beat few-shot prompting, leading to better personalization while maintaining quality.

🧵1/

January 4, 2026 at 3:18 PM

Reposted by Gabriele Sarti

Simon Willison

@simonwillison.net

Here's my enormous round-up of everything we learned about LLMs in 2025 - the third in my annual series of reviews of the past twelve months
simonwillison.net/2025/Dec/31/...
This year it's divided into 26 sections! This is the table of contents:

The year of “reasoning”
The year of agents
The year of coding agents and Claude Code
The year of LLMs on the command-line
The year of YOLO and the Normalization of Deviance
The year of $200/month subscriptions
The year of top-ranked Chinese open weight models
The year of long tasks
The year of prompt-driven image editing
The year models won gold in academic competitions
The year that Llama lost its way
The year that OpenAI lost their lead
The year of Gemini
The year of pelicans riding bicycles
The year I built 110 tools
The year of the snitch!
The year of vibe coding
The (only?) year of MCP
The year of alarmingly AI-enabled browsers
The year of the lethal trifecta
The year of programming on my phone
The year of conformance suites
The year local models got good, but cloud models got even better
The year of slop
The year that data centers got extremely unpopular
My own words of the year
That’s a wrap for 2025

December 31, 2025 at 11:54 PM

Reposted by Gabriele Sarti

NDIF Team

@ndif-team.bsky.social

Happy Holidays from NDIF! Our new NNsight version improves performance and enhances vLLM integration, including support for tensor parallelism.

December 19, 2025 at 10:51 PM

Reposted by Gabriele Sarti

David Bau

@davidbau.bsky.social

I have been teaching myself to vibe code.

Watch Claude Code grow my 780 lines to 13,600 - mandelbrot.page/coverage/ca...

Two fundamental rules for staying in control:
davidbau.com/archives/20...

December 18, 2025 at 8:01 PM

Gabriele Sarti

@gsarti.com

Big news! 🗞️ I defended my PhD thesis "From Insights to Impact: Actionable Interpretability for Neural Machine Translation" @rug.nl @gronlp.bsky.social

I'm grateful to my advisors @arianna-bis.bsky.social @malvinanissim.bsky.social and to everyone who played a role in this journey! 🎉 #PhDone

December 16, 2025 at 12:21 PM

Reposted by Gabriele Sarti

AILC-NLP

@ailc-nlp.bsky.social

The CALAMITA (Challenging the Abilities of LAnguage Models in ITAlian) paper is now available on arXiv:
arxiv.org/abs/2512.04759
We warmly thank all the individuals involved for their extraordinary work, dedication, and collaborative spirit that made this project possible!

December 9, 2025 at 6:19 PM

Gabriele Sarti

@gsarti.com

Kinda crazy the improvement from Nano banana (left) to NB Pro (right): "Create an infographic explaining how model components contribute to the prediction process of a decoder-only Transformer LLM. Use the residual stream view of the Transformer by Elhage et al. (2021) in your presentation."

November 21, 2025 at 8:14 AM

Gabriele Sarti

@gsarti.com

impactrank.org is an interesting take on how to rethink uni rankings to upweight quality rather than quantity. They use LLMs to extract "high impact" dependencies from papers and identify foundational work, tracing them back to PIs/unis
by matching their DBLP entries. Have a look!

Research Impact Rankings

impactrank.org

November 16, 2025 at 9:30 AM

Reposted by Gabriele Sarti

Can

@canrager.bsky.social

Humans and LLMs think fast and slow. Do SAEs recover slow concepts in LLMs? Not really.

Our Temporal Feature Analyzer discovers contextual features in LLMs, that detect event boundaries, parse complex grammar, and represent ICL patterns.

November 13, 2025 at 10:32 PM

Gabriele Sarti

@gsarti.com

New promising model for interpretability research just dropped!

Alexander Doria @dorialexander.bsky.social · Nov 10

Through this release, we aim both to support the emerging ecosystem for pretraining research (NanoGPT, NanoChat), explainability (you can literally look at Monad under a microscope) and the tooling orchestration around frontier models.

November 10, 2025 at 9:09 PM

Gabriele Sarti

@gsarti.com

Check out our awesome live-skeeted panel!

BlackboxNLP @blackboxnlp.bsky.social · Nov 9

Our panel moderated by @danaarad.bsky.social
"Evaluating Interpretability Methods: Challenges and Future Directions" just started! 🎉 Come to learn more about the MIB benchmark and hear the takes of @michaelwhanna.bsky.social, Michal Golovanevsky, Nicolò Brunello and Mingyang Wang!

November 9, 2025 at 7:18 AM

Gabriele Sarti

@gsarti.com

Follow @blackboxnlp.bsky.social for a live skeeting of the event!

BlackboxNLP @blackboxnlp.bsky.social · Nov 9

BlackboxNLP is up and running! Here's the topics covered by this year's edition at a glance. Excited to see so many interesting topics, and the growing interest in reasoning!

November 9, 2025 at 2:20 AM

Gabriele Sarti

@gsarti.com

Wrapping up my oral presentations today with our TACL paper "QE4PE: Quality Estimation for Human Post-editing" at the Interpretability morning session #EMNLP2025 (Room A104, 11:45 China time)!

Paper: arxiv.org/abs/2503.03044
Slides/video/poster: underline.io/lecture/1315...

November 7, 2025 at 2:50 AM

Gabriele Sarti

@gsarti.com

Presenting today our work "Unsupervised Word-level Quality Estimation Through the Lens of Annotator (Dis)agreement" at the Machine Translation morning session (Room A301, 11:45 China time). See you there! 🤗

Paper: aclanthology.org/2025.emnlp-m...
Slides/video/poster: underline.io/events/502/s...

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Gabriele Sarti, Vilém Zouhar, Malvina Nissim, Arianna Bisazza. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

aclanthology.org

November 6, 2025 at 1:19 AM

Reposted by Gabriele Sarti

Arnab Sen Sharma

@arnabsensharma.bsky.social

How can a language model find the veggies in a menu?

New pre-print where we investigate the internal mechanisms of LLMs when filtering on a list of options.

Spoiler: turns out LLMs use strategies surprisingly similar to functional programming (think "filter" from python)! 🧵

November 4, 2025 at 5:48 PM

Reposted by Gabriele Sarti

John David Pressman

@jdp.extropian.net

Language models can correctly answer questions about their previous intentions.
www.anthropic.com/research/int...

Emergent introspective awareness in large language models

Research from Anthropic on the ability of large language models to introspect

www.anthropic.com

October 29, 2025 at 6:21 PM

Reposted by Gabriele Sarti

Tiancheng Hu

@tiancheng.bsky.social

Can AI simulate human behavior? 🧠
The promise is revolutionary for science & policy. But there’s a huge "IF": Do these simulations actually reflect reality?
To find out, we introduce SimBench: The first large-scale benchmark for group-level social simulation. (1/9)

October 28, 2025 at 4:54 PM

Gabriele Sarti

@gsarti.com

Our group @gronlp.bsky.social is coming in strong for #EMNLP2025! See you soon in Suzhou! 👋 🇨🇳

GroNLP @gronlp.bsky.social · Oct 27

With only a week left for #EMNLP2025, we are happy to announce all the works we 🐮 will present 🥳 - come and say "hi" to our posters and presentations during the Main and the co-located events (*SEM and workshops) See you in Suzhou ✈️

accepted papers at main conference and findings

October 28, 2025 at 7:41 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news