Shahab Bakhtiari
@shahabbakht.bsky.social
6.2K followers 1.1K following 1.2K posts
|| assistant prof at University of Montreal || leading the systems neuroscience and AI lab (SNAIL: https://www.snailab.ca/) 🐌 || associate academic member of Mila (Quebec AI Institute) || #NeuroAI || vision and learning in brains and machines
Posts Media Videos Starter Packs
Pinned
shahabbakht.bsky.social
So excited to see this preprint released from the lab into the wild.

Charlotte has developed a theory for how learning curriculum influences learning generalization.
Our theory makes straightforward neural predictions that can be tested in future experiments. (1/4)

🧠🤖 🧠📈 #MLSky
charlottevolk.bsky.social
🚨 New preprint alert!

🧠🤖
We propose a theory of how learning curriculum affects generalization through neural population dimensionality. Learning curriculum is a determining factor of neural dimensionality - where you start from determines where you end up.
🧠📈

A 🧵:

tinyurl.com/yr8tawj3
The curriculum effect in visual learning: the role of readout dimensionality
Generalization of visual perceptual learning (VPL) to unseen conditions varies across tasks. Previous work suggests that training curriculum may be integral to generalization, yet a theoretical explan...
tinyurl.com
Reposted by Shahab Bakhtiari
sungkim.bsky.social
Diffusion models are not truly serial models

Diffusion models are:
- Methodologically looks serial (step by step).
- But performing less like a truly serial model (autoregression).

They find that diffusion model solves each problem with the same convergence rate. It will never be a serial model.
Reposted by Shahab Bakhtiari
sungkim.bsky.social
Meta released a paper on Hybrid RL

It offers a promising way to go beyond purely verifiable rewards - combining the reliability of verifier signals with the richness of learned feedback. The results are: +11.7 pts vs RM-only and +9.2 pts vs verifier-only on hard-to-verify reasoning tasks.
shahabbakht.bsky.social
Haha … I can reassure you that wasn’t my intended message at all :)
Reposted by Shahab Bakhtiari
dlevenstein.bsky.social
So I get that a Neuroscientist Couldn’t Understand a Microprocessor, and TBH I’m ok with that. But could a neuroscientist understand a deep RNN? Because that seems like a more pressing issue.

*assuming you think the brain operates through the parallel activity of many connected input/output units
shahabbakht.bsky.social
Most models within the neuroAI framework have been unimodal so far. I think moving towards multimodal models that are scalable and can satisfy some level of behavioural and neural alignment will force us to deal with the "readout" problem in a more serious way.
shahabbakht.bsky.social
Yeah there wasn’t enough space to expand on that.

I see neuroAI as a framework that gives us scalable, behaviorally relevant computational hypotheses. When it comes to readout, we haven’t even mapped out the space of possible mechanisms and algorithms yet.
shahabbakht.bsky.social
What do we talk about when we talk about "readout"?

I argued that our overly specialized, modular approach to studying the brain has given us a simplistic view of readout.

🧠📈
shahabbakht.bsky.social
Good article on AI boom by Noah Smith: open.substack.com/pub/noahpini...

"the great railroad bust did not happen because America built too many railroads. America didn’t build too many railroads! What happened was that America financed its railroads faster than they could capture value"
America's future could hinge on whether AI slightly disappoints
If the economy's single pillar goes down, Trump's presidency will be seen as a disaster.
open.substack.com
shahabbakht.bsky.social
Wait … how do you know that? :)
Reposted by Shahab Bakhtiari
eugenevinitsky.bsky.social
One strategic thing about making this an appealing scientific community is to overshare and boost work from graduate students here
shahabbakht.bsky.social
It's definitely not 50/50 for me. More like 10/90 ;)
shahabbakht.bsky.social
This feels a lot like systems neuro, honestly. You could hear similar advice there, especially from the more experimentally-oriented minds.
shahabbakht.bsky.social
I guess the whole predictive circuit finding approach can be seen as a convergent evolution, which probably doesn’t scale and generalize outside of the experiment setting?
shahabbakht.bsky.social
Having full observation and control over the studied system is definitely the main advantage of MI. But the unintuitive mess of high-d computation is their shared problem, which seems to need more theories than experiments.
shahabbakht.bsky.social
A systems neuroscientist turned mech interp researcher should write a paper on what the field should absolutely avoid, then observe how thoroughly they’ll be ignored :)

Though what I find intriguing in this domain (watching from afar): its much slower rate of progress compared to the rest of AI.
shahabbakht.bsky.social
Regardless of what explainability/mech interp in AI is actually after, and whether or not they know what they’re searching for, we can confidently say they’re pursuing what systems neuroscience has pursued for decades, with very similar puzzles and confusions.
bayesianboy.bsky.social
What problem is explainability/interpretability research trying to solve in ML, and do you have a favorite paper articulating what that problem is?
Reposted by Shahab Bakhtiari
bayesianboy.bsky.social
What problem is explainability/interpretability research trying to solve in ML, and do you have a favorite paper articulating what that problem is?
shahabbakht.bsky.social
I don't see a direct causal path, but pessimistically speaking, when bubbles burst they often leave subconscious biases against the bubbled topic, e.g., in evaluation committees. In other words, the current abundance of AI funding (relative to other fields) might not last.
shahabbakht.bsky.social
What if the bubble collapse also takes down our funding so we can't even afford H100s at half price?! :)
Reposted by Shahab Bakhtiari
drlaschowski.bsky.social
Imagine a brain decoding algorithm that could generalize across different subjects and tasks. Today, we’re one step closer to achieving that vision.

Introducing the flagship paper of our brain decoding program: www.biorxiv.org/content/10.1...
#neuroAI #compneuro @utoronto.ca @uhn.ca
Reposted by Shahab Bakhtiari
sushrutthorat.bsky.social
and the low-D part has been on the horizon since a bit now - proceedings.neurips.cc/paper/2019/h... - given complex numbers you can go loooowwww haha (O(1)). Also this is linked to top-down attention: arxiv.org/abs/1907.12309 , arxiv.org/abs/2502.15634 - which is a low-D modulation (O(N) vs O(N^2)).
Superposition of many models into one
proceedings.neurips.cc
shahabbakht.bsky.social
Yeah, it all makes sense in hindsight. I think the low-d structure of weights was actually the rationale behind LoRA when it was proposed.