And possible applications of the filter mechanism, like as a zero-shot "lie detector" that can flag incorrect statements in ordinary text.
And possible applications of the filter mechanism, like as a zero-shot "lie detector" that can flag incorrect statements in ordinary text.
bsky.app/profile/arn...
bsky.app/profile/arn...
If we pick up the representation for a question in French, it will accurately match items expressed in the Thai language.
If we pick up the representation for a question in French, it will accurately match items expressed in the Thai language.
The generic structure resembles functional programming's "filter" function, with a common mechanism handling a wide range of predicates.
bsky.app/profile/arn...
The generic structure resembles functional programming's "filter" function, with a common mechanism handling a wide range of predicates.
bsky.app/profile/arn...
OK, here it is fixed. Nice thing about workbench is that it just takes a second to edit the prompt, and you can see how the LLM responds, now deciding very early it should be ':'
OK, here it is fixed. Nice thing about workbench is that it just takes a second to edit the prompt, and you can see how the LLM responds, now deciding very early it should be ':'
Share the tool! Share what you find!
And send the team feedback -
bsky.app/profile/ndi...
Share the tool! Share what you find!
And send the team feedback -
bsky.app/profile/ndi...
Try it out yourself on workbench.ndif.us/.
Does it work with other words? Can you find interesting exceptions? How about prompts beyond translation?
Try it out yourself on workbench.ndif.us/.
Does it work with other words? Can you find interesting exceptions? How about prompts beyond translation?
Instead it first "thinks" about the (English) word "love".
In other words: LLMs translate using *concepts*, not tokens.
Instead it first "thinks" about the (English) word "love".
In other words: LLMs translate using *concepts*, not tokens.
The workbench doesn't just show you the model's output. It shows the grid of internal states that lead to the output. Researchers call this visualization the "logit lens".
The workbench doesn't just show you the model's output. It shows the grid of internal states that lead to the output. Researchers call this visualization the "logit lens".
Visit the NDIF workbench here: workbench.ndif.us/, and pull up any LLM that can translate, like GPT-J-6b. If you register an account you can access larger models.
bsky.app/profile/ndi...
Visit the NDIF workbench here: workbench.ndif.us/, and pull up any LLM that can translate, like GPT-J-6b. If you register an account you can access larger models.
bsky.app/profile/ndi...
www.youtube.com/@NDIFTeam
www.youtube.com/@NDIFTeam
cool ideas about representations in llms with linguistic relevance!
I want to draw your attention to a COLM paper by my student @sfeucht.bsky.social that has totally changed the way I think and teach about LLM representations. The work is worth knowing.
And you can meet Sheridan at COLM, Oct 7!
bsky.app/profile/sfe...
footprints.baulab.info <- token context erasure
arithmetic.baulab.info <- concept parallelograms
dualroute.baulab.info <- the second induction route,
w a neat colab notebook.
@ericwtodd.bsky.social @byron.bsky.social @diatkinson.bsky.social
footprints.baulab.info <- token context erasure
arithmetic.baulab.info <- concept parallelograms
dualroute.baulab.info <- the second induction route,
w a neat colab notebook.
@ericwtodd.bsky.social @byron.bsky.social @diatkinson.bsky.social
We need to be aware when an LM is thinking about tokens or concepts.
They do both, and it makes a difference which way it's thinking.
We need to be aware when an LM is thinking about tokens or concepts.
They do both, and it makes a difference which way it's thinking.
@keremsahin22.bsky.social + Sheridan are finding cool ways to look into Olah's induction hypothesis too!
@keremsahin22.bsky.social + Sheridan are finding cool ways to look into Olah's induction hypothesis too!
Sheridan discovered (Neurips mechint 2025) that semantic vector arithmetic works better in this space. (Token semantics work in tokenspace.)
arithmetic.baulab.info/
Sheridan discovered (Neurips mechint 2025) that semantic vector arithmetic works better in this space. (Token semantics work in tokenspace.)
arithmetic.baulab.info/
That happens even for computer code. They copy the BEHAVIOR of the code, but write it in a totally different way!
That happens even for computer code. They copy the BEHAVIOR of the code, but write it in a totally different way!