Pau Rodriguez
banner
paurodriguez.bsky.social
Pau Rodriguez
@paurodriguez.bsky.social
Research Scientist at Apple Machine Learning Research. Previously ServiceNow and Element AI in Montréal.
LinEAS globally 🌐 optimizes all 1D-Wasserstein distances between source and target activation distributions at multiple layers via backprop. ✨ Bonus: we can now add a sparsity objective. The result? Targeted 🎯 interventions that preserve fluency with strong conditioning!
October 21, 2025 at 10:00 AM
Existing methods estimate layer-wise 🥞 interventions. While powerful, layer-wise methods have some approximation error since the optimization is done locally, without considering multiple layers at once 🤔. We circumvent this problem in LinEAS with an end-to-end optimization ⚙️!
October 21, 2025 at 10:00 AM
🦊Activation Steering modifies a model's internal activations to control its output. Think of a slider 🎚️ that gradually adds a concept, like art style 🎨 to the output. This is also a powerful tool for safety, steering models away from harmful content.
October 21, 2025 at 10:00 AM
🚀 Excited to share LinEAS, our new activation steering method accepted at NeurIPS 2025! It approximates optimal transport maps e2e to precisely guide 🧭 activations achieving finer control 🎚️ with ✨ less than 32 ✨ prompts!

💻https://github.com/apple/ml-lineas
📄https://arxiv.org/abs/2503.10679
October 21, 2025 at 10:00 AM
8/9 T2I models tend to generate negated concepts 😮

In the image, StableDiffusion XL prompted with: “2 tier cake with multicolored stars attached to it and no {white bear, pink elephant, gorilla} can be seen.”

✨Linear-AcT makes the negated concept disappear✨
December 10, 2024 at 1:09 PM
7/9 And here we induce Cyberpunk 🤖 for the same prompt!
December 10, 2024 at 1:09 PM
6/9 Amazingly, we can condition Text-to-Image (T2I) Diffusion with the same exact method we used for LLMs! 🤯

In this example, we induce a specific style (Art Nouveau 🎨), which we can accurately control with our λ parameter.
December 10, 2024 at 1:09 PM
5/9 With Linear-AcT, we achieve great results in LLM 👿 toxicity mitigation and 👩🏼‍⚖️ truthfulness induction.

And the best result is always obtained at λ=1, as opposed to vector-based steering methods!
December 10, 2024 at 1:09 PM
4/9 Linear-AcT preserves target distributions, with interpretable strength λ 🌈

🍰 All we need is two small sets of sentences {a},{b} from source and target distributions to estimate the Optimal Transport (OT) map 🚚

🚀 We linearize the map for speed/memory, thus ⭐Linear-AcT⭐
December 10, 2024 at 1:09 PM
3/9 An activation has a different output distributions per behavior, eg. 🦠 toxic (source) and 😊 non-toxic (target). i) Vector-based AS moves activations OOD 🤯, with catastrophic consequences 💥 harming model utility. ii) The strength λ is unbounded and non-interpretable 🤨!
December 10, 2024 at 1:09 PM
Thrilled to share the latest work from our team at
@Apple
where we achieve interpretable and fine-grained control of LLMs and Diffusion models via Activation Transport 🔥

📄 arxiv.org/abs/2410.23054
🛠️ github.com/apple/ml-act

0/9 🧵
December 10, 2024 at 1:09 PM