Author | Lightnews

Ann Huang

@annhuang42.bsky.social

oh excellent pointer! That indeed matches our intuition

November 24, 2025 at 8:04 PM

Ann Huang

@annhuang42.bsky.social

And variance in the behavior is not necessarily coupled with that in the features; for example see this paper for a dissociation between the two openreview.net/forum?id=Yuc...

Not all solutions are created equal: An analytical dissociation of...

A foundational principle of connectionism is that perception, action, and cognition emerge from parallel computations among simple, interconnected units that generate and rely on neural...

openreview.net

November 24, 2025 at 7:56 PM

Ann Huang

@annhuang42.bsky.social

What we found was that more consistent features during training does not guarantee more similar OOD behavior. In fact here stronger feature learning can lead to more variable OOD behavior, which we hypothesize was due to overfitting

November 24, 2025 at 7:47 PM

Ann Huang

@annhuang42.bsky.social

Just looked at your paper, we’re basically motivated by the same question applied to different architectures! will try to visit your poster too

November 24, 2025 at 7:37 PM

Ann Huang

@annhuang42.bsky.social

yay thanks Dan!!

November 24, 2025 at 7:32 PM

Ann Huang

@annhuang42.bsky.social

Thanks to my amazing collaborators and my PI! @satpreetsingh.bsky.social @flavioh.bsky.social @kanakarajanphd.bsky.social

🔹Paper: arxiv.org/pdf/2410.03972
🔹Poster: Fri Dec 5, Poster #2001 at Exhibition Hall C, D, E

Happy to chat at NeurIPS or by email at [email protected]!

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

Our results:
- support the contravariance principle (Cao & @dyamins.bsky.social)
- reveal when weight- & dynamic-level variability move together (or opposite)
- give "knobs" for controlling degeneracy, whether you're studying shared mechanisms or individual variability in task-trained RNNs.

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

4️⃣ Regularization (L1, low-rank)
Both types of structural regularization reduce degeneracy across all levels. Regularization nudges networks toward more consistent, shared solutions.

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

3️⃣ Network size
When we fix feature learning (using µP), larger RNNs converge to more consistent solutions at all levels — weights, dynamics, and behavior.
A clean convergence-with-scale effect, demonstrated on RNNs across levels.

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

We then causally tested feature learning’s effect on degeneracy using µP scaling. Stronger feature learning reduces dynamical degeneracy & increases weight degeneracy (like harder tasks).
It also increases behavioral degeneracy under OOD inputs (likely due to overfitting).

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

2️⃣ Feature learning
Complex tasks push RNNs into feature learning, where the network has to adapt its internal weights and features to solve the task. Weights travel much farther from initialization, leading to more dispersed weights in the weight space (higher degeneracy).

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

1️⃣ Task complexity
As tasks get harder, we observe less degeneracy in dynamics/behavior, but more degeneracy in the weights.

When trained on harder tasks, RNNs converge to similar neural dynamics and OOD behavior, but their weight configurations diverge. Why?

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

Using 3,400 RNNs across 4 neuroscience-relevant tasks (flip-flop memory, working memory, pattern generation, path integration), we systematically varied:
- task complexity
- learning regime
- network size
- regularization

Our findings:

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

Our unified framework measures & controls degeneracy at 3 levels:
🎯 Behavior: variability in OOD performance
🧠 Dynamics: distance btwn neural trajectories, quantified by Dynamical Similarity Analysis
⚙️ Weights: permutation-invariant Frobenius distance btwn recurrent weights

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

RNNs trained from different seeds on the same task can show strikingly different internal solutions, even when they perform equally well. We call this solution degeneracy.

November 24, 2025 at 4:43 PM

Ann Huang

@annhuang42.bsky.social

📍Excited to share that our paper was selected as a Spotlight at #NeurIPS2025!

arxiv.org/pdf/2410.03972

It started from a question I kept running into:

When do RNNs trained on the same task converge/diverge in their solutions?
🧵⬇️

November 24, 2025 at 4:43 PM

Reposted by Ann Huang

Mitchell Ostrow

@neurostrow.bsky.social

Our next paper on comparing dynamical systems (with special interest to artificial and biological neural networks) is out!! Joint work with @annhuang42.bsky.social , as well as @satpreetsingh.bsky.social , @leokoz8.bsky.social , Ila Fiete, and @kanakarajanphd.bsky.social : arxiv.org/pdf/2510.25943

November 10, 2025 at 4:16 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news