Jascha Achterberg
banner
achterbrain.bsky.social
Jascha Achterberg
@achterbrain.bsky.social
Neuroscience & AI at University of Oxford and University of Cambridge | Principles of efficient computations + learning in brains, AI, and silicon 🧠 NeuroAI | Gates Cambridge Scholar

www.jachterberg.com
This new model opens a whole new world of analysing multi region interaction across trials and tasks! More analysis and findings can be found in our paper linked below. Work lead by Jack Cook, and with great help from @danakarca.bsky.social and @somnirons.bsky.social !

arxiv.org/abs/2506.02813
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts
Examples of such pathways can be found in the interactions between cortical and subcortical networks during learning, or in sub-networks specializing for task characteristics such as difficulty or mod...
arxiv.org
November 21, 2025 at 12:01 PM
We also find that while complex regions are needed to learn complex tasks, these tasks are eventually moved toward simpler regions, similar to how you may struggle the first time when learning a new skill, but slowly get better with practice.
November 21, 2025 at 12:01 PM
Furthermore, we find that these pathways mirror our expected behavior of pathways in the brain! We find that difficult tasks need to be learned in more complex regions, similar to how you need to think “harder” when learning how to solve a difficult math problem.
November 21, 2025 at 12:01 PM
With these three features in place, we find that our third criterion of distinct pathways is also met. While baseline models exhibit largely random expert usage patterns, our models exhibit highly structured pathways between regions that reliably emerge during learning.
November 21, 2025 at 12:01 PM
Our third contribution is expert dropout. Without this feature, we find models suffer large performance deficits when experts outside of the active pathway are disabled. However, we would want models to be primarily dependent on the experts that are most being used.
November 21, 2025 at 12:01 PM
When put together, these two contributions resulted in remarkable pathway consistency in our model, which we measured by correlating the routing patterns across 10 different models trained on the same tasks.
November 21, 2025 at 12:01 PM
We then identify three inductive biases that yield pathways that meet each of these criteria.

The first of these is a routing loss that penalizes the use of more complex experts during training, and the second scales this loss by the model’s performance on the task being solved.
November 21, 2025 at 12:01 PM
We then set three criteria to determine whether pathways had formed:

(1) Consistency: Models trained on the same tasks should have similar pathways

(2) Self-sufficiency: Pathways should be primarily reliant on their own experts

(3) Distinctness: Many distinct pathways should be present
November 21, 2025 at 12:01 PM
We first needed to create a model in which we could study pathway formation. We chose a Heterogeneous Mixture-of-Experts architecture, in which information can be dynamically routed to computational experts, or regions, of varying sizes.

We train model on 82 tasks of varying complexity (ModCog)!
November 21, 2025 at 12:01 PM
All good Dan!
November 14, 2025 at 1:41 PM
I find your point about probabilistic definition interesting -- never seen such a definition of it, but that could neatly link to my 'usefulness' framing, as for any sort of expected value computation you would need to take 'likelihood given context' into account.
August 20, 2025 at 5:08 PM
Now the usefulness in program generation might sometimes align with policy compression, but that depends a lot on the given time horizon one assumes for the definition of 'usefulness'.
August 20, 2025 at 5:08 PM
It also does not 100% align with my reading of it, but I found it an interesting angle. I think I find myself, naturally, being influenced by Alan Newell's take on it (which is the one John Duncan tends to reference), which is aimed at usefulness in program generation.
August 20, 2025 at 5:08 PM