Lightnews — Scholar-powered news

Alexandra Proca

@aproca.bsky.social

110 followers 150 following 15 posts

PhD student at Imperial College London, currently visiting researcher at ENS Paris. theoretical neuroscience, machine learning. aproca.github.io

Posts Replies Media Videos

Alexandra Proca

@aproca.bsky.social

Finally, although many results we present are based on SVD, we also derive a form based on an eigendecomposition, allowing for rotational dynamics and to which our framework naturally extends to. We use this to study learning in terms of polar coordinates in the complex plane.

June 20, 2025 at 5:29 PM

Alexandra Proca

@aproca.bsky.social

To study how recurrence might impact feature learning, we derive the NTK for finite-width LRNNs and evaluate its movement during training. We find that recurrence appears to facilitate kernel movement across many settings, suggesting a bias towards rich learning.

June 20, 2025 at 5:29 PM

Alexandra Proca

@aproca.bsky.social

Motivated by this, we study task dynamics without zero-loss solutions and find that there exists a tradeoff between recurrent and feedforward computations that is characterized by a phase transition and leads to low-rank connectivity.

June 20, 2025 at 5:29 PM

Alexandra Proca

@aproca.bsky.social

By analyzing the energy function, we identify an effective regularization term that incentivizes small weights, especially when task dynamics are not perfectly learnable.

June 20, 2025 at 5:29 PM

Alexandra Proca

@aproca.bsky.social

Additionally, these results predict behavior in networks performing integration tasks, where we relax our theoretical assumptions.

June 20, 2025 at 5:29 PM

Alexandra Proca

@aproca.bsky.social

Next, we show that task dynamics determine a RNN’s ability to extrapolate to other sequence lengths and its hidden layer stability, even if there exists a perfect zero-loss solution.

June 20, 2025 at 5:29 PM

Alexandra Proca

@aproca.bsky.social

We find that learning speed is dependent on both the scale of SVs and their temporal ordering, such that SVs occurring later in the trajectory have a greater impact on learning speed.

June 20, 2025 at 5:29 PM

Alexandra Proca

@aproca.bsky.social

Using this form, we derive solutions to the learning dynamics of the input-output modes and local approximations of the recurrent modes separately, and identify differences in the learning dynamics of recurrent networks compared to feedforward ones.

June 20, 2025 at 5:29 PM

Alexandra Proca

@aproca.bsky.social

We derive a form where the task dynamics are fully specified by the data correlation singular values (or eigenvalues) across time (t=1:T), and learning is characterized by a set of gradient flow equations and energy function that are decoupled across different dimensions.

June 20, 2025 at 5:29 PM

Alexandra Proca

@aproca.bsky.social

We study a RNN that receives an input at each timestep and produces a final output at the last timestep (and generalize to the autoregressive case later). For each input at time t and the output, we can construct correlation matrices and compute their SVD (or eigendecomposition).

June 20, 2025 at 5:29 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news