Valérie Castin
banner
vcastin.bsky.social
Valérie Castin
@vcastin.bsky.social
PhD student in Machine learning at Ecole Normale Supérieure, Paris

My webpage: https://vcastin.github.io/
Reposted by Valérie Castin
I asked "on the other platform" what were the most important improvements to the original 2017 transformer.

That was quite popular and here is a synthesis of the responses:
April 28, 2025 at 6:47 AM
Reposted by Valérie Castin
Excited to share Soup-of-Experts, a new neural network architecture that, for any given specific task, can instantiate in a flash a small model that is very good on it.

Made with ❤️ at Apple

Thanks to my co-authors David Grangier, Angelos Katharopoulos, and Skyler Seto!

arxiv.org/abs/2502.01804
February 5, 2025 at 9:32 AM
Reposted by Valérie Castin
A cute result from Valérie’s work is that Gaussian distributions remain closed under evolution by attentions layers, allowing one to study an ODE in the (mean, covariance) space. In particular, this enables the analysis of the “clustering of tokens” toward low-rank covariances.
February 1, 2025 at 9:54 AM
How do tokens evolve as they are processed by a deep Transformer?

With José A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322

ML and PDE lovers, check it out!
January 31, 2025 at 4:56 PM
Reposted by Valérie Castin
Excited to see Sigmoid Attention accepted at ICLR 2025 !!

Make attention ~18% faster with a drop-in replacement 🚀

Code:
github.com/apple/ml-sig...

Paper
arxiv.org/abs/2409.04431
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as...
arxiv.org
January 24, 2025 at 6:47 PM
Reposted by Valérie Castin
The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465
January 22, 2025 at 9:11 AM
Reposted by Valérie Castin
Machine learning has made incredible breakthroughs, but our theoretical understanding lags behind.

We take a step towards unravelling its mystery by explaining why the phenomenon of disentanglement arises in generative latent variable models.

Blog post: carl-allen.github.io/theory/2024/...
December 18, 2024 at 4:58 PM
Reposted by Valérie Castin
It's like when Google decided to fund itself through ads, but worse, because chatbots are already much more misleading and anthropomorphic than search engines. #AIEthics www.ft.com/content/9350...
OpenAI explores advertising as it steps up revenue drive
ChatGPT maker hires advertising talent from big tech rivals
www.ft.com
December 8, 2024 at 8:47 PM