Lightnews — Scholar-powered news

Valérie Castin

@vcastin.bsky.social

94 followers 53 following 1 posts

PhD student in Machine learning at Ecole Normale Supérieure, Paris

My webpage: https://vcastin.github.io/

Posts Replies Media Videos

Reposted by Valérie Castin

François Fleuret

@francois.fleuret.org

I asked "on the other platform" what were the most important improvements to the original 2017 transformer.

That was quite popular and here is a synthesis of the responses:

April 28, 2025 at 6:47 AM

Reposted by Valérie Castin

Pierre Ablin

@pierreablin.bsky.social

Excited to share Soup-of-Experts, a new neural network architecture that, for any given specific task, can instantiate in a flash a small model that is very good on it.

Made with ❤️ at Apple

Thanks to my co-authors David Grangier, Angelos Katharopoulos, and Skyler Seto!

arxiv.org/abs/2502.01804

February 5, 2025 at 9:32 AM

Reposted by Valérie Castin

Gabriel Peyré

@gabrielpeyre.bsky.social

A cute result from Valérie’s work is that Gaussian distributions remain closed under evolution by attentions layers, allowing one to study an ODE in the (mean, covariance) space. In particular, this enables the analysis of the “clustering of tokens” toward low-rank covariances.

February 1, 2025 at 9:54 AM

Valérie Castin

@vcastin.bsky.social

How do tokens evolve as they are processed by a deep Transformer?

With José A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322

ML and PDE lovers, check it out!

January 31, 2025 at 4:56 PM

Reposted by Valérie Castin

Pierre Ablin

@pierreablin.bsky.social

Excited to see Sigmoid Attention accepted at ICLR 2025 !!

Make attention ~18% faster with a drop-in replacement 🚀

Code:
github.com/apple/ml-sig...

Paper
arxiv.org/abs/2409.04431

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted sum of values. The weights are typically obtained as...

arxiv.org

January 24, 2025 at 6:47 PM

Reposted by Valérie Castin

Gabriel Peyré

@gabrielpeyre.bsky.social

The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465

January 22, 2025 at 9:11 AM

Reposted by Valérie Castin

Carl Allen

@carl-allen.bsky.social

Machine learning has made incredible breakthroughs, but our theoretical understanding lags behind.

We take a step towards unravelling its mystery by explaining why the phenomenon of disentanglement arises in generative latent variable models.

Blog post: carl-allen.github.io/theory/2024/...

December 18, 2024 at 4:58 PM

Reposted by Valérie Castin

Carissa Véliz

@carissaveliz.bsky.social

It's like when Google decided to fund itself through ads, but worse, because chatbots are already much more misleading and anthropomorphic than search engines. #AIEthics www.ft.com/content/9350...

OpenAI explores advertising as it steps up revenue drive

ChatGPT maker hires advertising talent from big tech rivals

www.ft.com

December 8, 2024 at 8:47 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news