Meenakshi Khosla
@meenakshikhosla.bsky.social
57 followers 67 following 6 posts
Assistant Professor at UCSD Cognitive Science and CSE (affiliate) | Past: Postdoc @MIT, PhD @Cornell, B. Tech @IITKanpur | Interested in Biological and Artificial Intelligence
Posts Media Videos Starter Packs
meenakshikhosla.bsky.social
@andre-longon.bsky.social led/executed this project beautifully—he's applying to PhD programs this fall and would be an incredible addition to any lab!
meenakshikhosla.bsky.social
also thanks to @david-klindt.bsky.social
for an incredible collaboration.
meenakshikhosla.bsky.social
The takeaway: superposition isn’t just an interpretability issue—it warps alignment metrics too. Disentangling reveals the true representational overlap between models and between models and brains.
meenakshikhosla.bsky.social
Across toy models, ImageNet DNNs (ResNet, ViT), and even brain data (NSD), alignment scores jump once we replace base neurons with their disentangled SAE latents—showing that superposition can mask shared structure.
meenakshikhosla.bsky.social
We develop a theory showing how superposition arrangements deflate predictive-mapping metrics. Then we test it: disentangling with sparse autoencoders (SAEs) reveals hidden correspondences.