Ann Huang
banner
annhuang42.bsky.social
Ann Huang
@annhuang42.bsky.social
Comp Neuro, ML, Dynamical Systems 🧠🤖PhD student at Harvard & Kempner Institute. Prev at McGill, Mila, EPFL.
Pinned
📍Excited to share that our paper was selected as a Spotlight at #NeurIPS2025!

arxiv.org/pdf/2410.03972

It started from a question I kept running into:

When do RNNs trained on the same task converge/diverge in their solutions?
🧵⬇️
oh excellent pointer! That indeed matches our intuition
November 24, 2025 at 8:04 PM
And variance in the behavior is not necessarily coupled with that in the features; for example see this paper for a dissociation between the two openreview.net/forum?id=Yuc...
Not all solutions are created equal: An analytical dissociation of...
A foundational principle of connectionism is that perception, action, and cognition emerge from parallel computations among simple, interconnected units that generate and rely on neural...
openreview.net
November 24, 2025 at 7:56 PM
What we found was that more consistent features during training does not guarantee more similar OOD behavior. In fact here stronger feature learning can lead to more variable OOD behavior, which we hypothesize was due to overfitting
November 24, 2025 at 7:47 PM
Just looked at your paper, we’re basically motivated by the same question applied to different architectures! will try to visit your poster too
November 24, 2025 at 7:37 PM
yay thanks Dan!!
November 24, 2025 at 7:32 PM
Thanks to my amazing collaborators and my PI! @satpreetsingh.bsky.social @flavioh.bsky.social @kanakarajanphd.bsky.social

🔹Paper: arxiv.org/pdf/2410.03972
🔹Poster: Fri Dec 5, Poster #2001 at Exhibition Hall C, D, E

Happy to chat at NeurIPS or by email at [email protected]!
November 24, 2025 at 4:43 PM
Our results:
- support the contravariance principle (Cao & @dyamins.bsky.social)
- reveal when weight- & dynamic-level variability move together (or opposite)
- give "knobs" for controlling degeneracy, whether you're studying shared mechanisms or individual variability in task-trained RNNs.
November 24, 2025 at 4:43 PM
4️⃣ Regularization (L1, low-rank)
Both types of structural regularization reduce degeneracy across all levels. Regularization nudges networks toward more consistent, shared solutions.
November 24, 2025 at 4:43 PM
3️⃣ Network size
When we fix feature learning (using µP), larger RNNs converge to more consistent solutions at all levels — weights, dynamics, and behavior.
A clean convergence-with-scale effect, demonstrated on RNNs across levels.
November 24, 2025 at 4:43 PM
We then causally tested feature learning’s effect on degeneracy using µP scaling. Stronger feature learning reduces dynamical degeneracy & increases weight degeneracy (like harder tasks).
It also increases behavioral degeneracy under OOD inputs (likely due to overfitting).
November 24, 2025 at 4:43 PM
2️⃣ Feature learning
Complex tasks push RNNs into feature learning, where the network has to adapt its internal weights and features to solve the task. Weights travel much farther from initialization, leading to more dispersed weights in the weight space (higher degeneracy).
November 24, 2025 at 4:43 PM
1️⃣ Task complexity
As tasks get harder, we observe less degeneracy in dynamics/behavior, but more degeneracy in the weights.

When trained on harder tasks, RNNs converge to similar neural dynamics and OOD behavior, but their weight configurations diverge. Why?
November 24, 2025 at 4:43 PM
Using 3,400 RNNs across 4 neuroscience-relevant tasks (flip-flop memory, working memory, pattern generation, path integration), we systematically varied:
- task complexity
- learning regime
- network size
- regularization

Our findings:
November 24, 2025 at 4:43 PM
Our unified framework measures & controls degeneracy at 3 levels:
🎯 Behavior: variability in OOD performance
🧠 Dynamics: distance btwn neural trajectories, quantified by Dynamical Similarity Analysis
⚙️ Weights: permutation-invariant Frobenius distance btwn recurrent weights
November 24, 2025 at 4:43 PM
RNNs trained from different seeds on the same task can show strikingly different internal solutions, even when they perform equally well. We call this solution degeneracy.
November 24, 2025 at 4:43 PM
📍Excited to share that our paper was selected as a Spotlight at #NeurIPS2025!

arxiv.org/pdf/2410.03972

It started from a question I kept running into:

When do RNNs trained on the same task converge/diverge in their solutions?
🧵⬇️
November 24, 2025 at 4:43 PM
Reposted by Ann Huang
Our next paper on comparing dynamical systems (with special interest to artificial and biological neural networks) is out!! Joint work with @annhuang42.bsky.social , as well as @satpreetsingh.bsky.social , @leokoz8.bsky.social , Ila Fiete, and @kanakarajanphd.bsky.social : arxiv.org/pdf/2510.25943
November 10, 2025 at 4:16 PM