Thomas Fel
@thomasfel.bsky.social
1.4K followers 340 following 34 posts
Explainability, Computer Vision, Neuro-AI.🪴 Kempner Fellow @Harvard. Prev. PhD @Brown, @Google, @GoPro. Crêpe lover. 📍 Boston | 🔗 thomasfel.me
Posts Media Videos Starter Packs
Pinned
thomasfel.bsky.social
I’ll be at @neuripsconf.bsky.social
this year, sharing some work on explainability and representations. If you’re attending and want to chat, feel free to reach out !👋
thomasfel.bsky.social
That concludes this two-part descent into the Rabbit Hull.
Huge thanks to all collaborators who made this work possible — and especially to @binxuwang.bsky.social , with whom this project was built, experiment after experiment.
🎮 kempnerinstitute.github.io/dinovision/
📄 arxiv.org/pdf/2510.08638
thomasfel.bsky.social
If this holds, three implications:
(i) Concepts = points (or regions), not directions
(ii) Probing is bounded: toward archetypes, not vectors
(iii) Can't recover generating hulls from sum: we should look deeper than just a single-layer activations to recover the true latents
thomasfel.bsky.social
Synthesizing these observations, we propose a refined view, motivated by Gärdenfors' theory and attention geometry.
Activations = multiple convex hulls simultaneously: a rabbit among animals, brown among colors, fluffy among textures.

The Minkowski Representation Hypothesis.
thomasfel.bsky.social
Taken together, the signs of partial density, local connectedness, and coherent dictionary atoms indicate that DINO’s representations are organized beyond linear sparsity alone.
thomasfel.bsky.social
Can position explain this ?

We found that pos. information collapses: from high-rank to a near 2-dim sheet. Early layers encode precise location; later ones retain abstract axes.

This compression frees dimensions for features, and *position doesn't explain PCA map smoothness*
thomasfel.bsky.social
Patch embeddings form smooth, connected surfaces tracing objects and boundaries.

This may suggests interpolative geometry: tokens as mixtures between landmarks, shaped by clustering and spreading forces in the training objectives.
thomasfel.bsky.social
We found antipodal feature pairs (dᵢ ≈ − dⱼ): vertical vs horizontal lines, white vs black shirts, left vs right…

Also, co-activation statistics only moderately shape geometry: concepts that fire together aren't necessarily nearby—nor orthogonal when they don't.
thomasfel.bsky.social
Under the Linear Rep. Hypothesis, we'd expect Dictionary to be quasi-orthogonality.
Instead, training drives atoms from near-Grassmannian initialization to higher coherence.
Several concepts fire almost always the embedding is partly dense (!), contradicting pure sparse coding.
thomasfel.bsky.social
🕳️🐇Into the Rabbit Hull – Part II

Continuing our interpretation of DINOv2, the second part of our study concerns the *geometry of concepts* and the synthesis of our findings toward a new representational *phenomenology*:

the Minkowski Representation Hypothesis
thomasfel.bsky.social
Huge thanks to all collaborators who made this work possible, and especially to @binxuwang.bsky.social. This work grew from a year of collaboration!
Tomorrow, Part II: geometry of concepts and Minkowski Representation Hypothesis.
🕹️ kempnerinstitute.github.io/dinovision
📄 arxiv.org/pdf/2510.08638
thomasfel.bsky.social
Curious tokens, the registers.
DINO seems to use them to encode global invariants: we find concepts (directions) that fire exclusively (!) on registers.

Example of such concepts include motion blur detector and style (game screenshots, drawings, paintings, warped images...)
thomasfel.bsky.social
Now for depth estimation. How does DINO know depth?

It turns out it has discovered several human-like monocular depth cues: texture gradients resembling blurring or bokeh, shadow detectors, and projective cues.

Most units mix cues, but a few remain remarkably pure.
thomasfel.bsky.social
Another surprise here: the most important concepts are not object-centric at all, but boundary detectors. Remarkably, these concepts coalesce into a low-dimensional subspace within (see paper).
thomasfel.bsky.social
This kind of concept breaks a key assumption in interpretability: that a concept is about the tokens where it fires. Here it is the opposite—the concept is defined by where it does not fire. An open question is how models form such concepts.
thomasfel.bsky.social
Let's zoom in on classification.
For every class, we find two concepts: one fires on the object (e.g., "rabbit"), and another fires everywhere *except* the object -- but only when it's present!

We call them Elsewhere Concepts (credit: @davidbau.bsky.social).
thomasfel.bsky.social
Assuming the Linear Rep. Hypothesis, SAEs arise naturally as instruments for concept extraction, they will be our companions in this descent.
Archetypal SAE uncovered 32k concepts.

Our first observation: different tasks recruit distinct regions of this conceptual space.
thomasfel.bsky.social
🕳️🐇 𝙄𝙣𝙩𝙤 𝙩𝙝𝙚 𝙍𝙖𝙗𝙗𝙞𝙩 𝙃𝙪𝙡𝙡 – 𝙋𝙖𝙧𝙩 𝙄 (𝑃𝑎𝑟𝑡 𝐼𝐼 𝑡𝑜𝑚𝑜𝑟𝑟𝑜𝑤)

𝗔𝗻 𝗶𝗻𝘁𝗲𝗿𝗽𝗿𝗲𝘁𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗱𝗲𝗲𝗽 𝗱𝗶𝘃𝗲 𝗶𝗻𝘁𝗼 𝗗𝗜𝗡𝗢𝘃𝟮, one of vision’s most important foundation models.

And today is Part I, buckle up, we're exploring some of its most charming features. :)
thomasfel.bsky.social
Really neat, congrats !
Reposted by Thomas Fel
Reposted by Thomas Fel
jessicahullman.bsky.social
For XAI it’s often thought explanations help (boundedly rational) user “unlock” info in features for some decision. But no one says this, they say vaguer things like “supporting trust”. We lay out some implicit assumptions that become clearer when you take a formal view here arxiv.org/abs/2506.22740
Explanations are a means to an end
Modern methods for explainable machine learning are designed to describe how models map inputs to outputs--without deep consideration of how these explanations will be used in practice. This paper arg...
arxiv.org
Reposted by Thomas Fel
davidpicard.bsky.social
🚨Updated: "How far can we go with ImageNet for Text-to-Image generation?"

TL;DR: train a text2image model from scratch on ImageNet only and beat SDXL.

Paper, code, data available! Reproducible science FTW!
🧵👇

📜 arxiv.org/abs/2502.21318
💻 github.com/lucasdegeorg...
💽 huggingface.co/arijitghosh/...
Reposted by Thomas Fel
gretatuckute.bsky.social
Check out @mryskina.bsky.social's talk and poster at COLM on Tuesday—we present a method to identify 'semantically consistent' brain regions (responding to concepts across modalities) and show that more semantically consistent brain regions are better predicted by LLMs.
mryskina.bsky.social
Interested in language models, brains, and concepts? Check out our COLM 2025 🔦 Spotlight paper!

(And if you’re at COLM, come hear about it on Tuesday – sessions Spotlight 2 & Poster 2)!
Paper title: Language models align with brain regions that represent concepts across modalities.
Authors:  Maria Ryskina, Greta Tuckute, Alexander Fung, Ashley Malkin, Evelina Fedorenko. 
Affiliations: Maria is affiliated with the Vector Institute for AI, but the work was done at MIT. All other authors are affiliated with MIT. 
Email address: maria.ryskina@vectorinstitute.ai.
Reposted by Thomas Fel
bayazitdeniz.bsky.social
1/🚨 New preprint

How do #LLMs’ inner features change as they train? Using #crosscoders + a new causal metric, we map when features appear, strengthen, or fade across checkpoints—opening a new lens on training dynamics beyond loss curves & benchmarks.

#interpretability
Reposted by Thomas Fel
lchoshen.bsky.social
Employing mechanistic interpretability to study how models learn, not just where they end up
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291