Arna Ghosh
arnaghosh.bsky.social
Arna Ghosh
@arnaghosh.bsky.social
PhD student at Mila & McGill University, Vanier scholar • 🧠+🤖 grad student• Ex-RealityLabs, Meta AI • Believer in Bio-inspired AI • Comedy+Cricket enthusiast
Why do these geometric phases arise?🤔

We show, both through theory and with simulations in a toy model, that these non-monotonic spectral changes occur due to gradient descent dynamics with cross-entropy loss under 2 conditions:

1. skewed token frequencies
2. representation bottlenecks

🧵6/9
October 31, 2025 at 4:19 PM
🧵 4/8 First key finding: Gradient descent's implicit bias reveals a sweet spot in feature learning. Too little orthogonalization → feature collapse. Too much → unstable learning dynamics.We characterized this trade-off for harnessing the value of small projectors 🎯
December 13, 2024 at 3:44 AM