Author | Lightnews

Reposted by Martina Vilas

Besmira Nushi @besmiranushi.bsky.social · 14d

When to call it quits in LLM reasoning? 🛑

‪Martina's internship project suggests trace monitoring metrics and classifiers that can detect when an LLM reasoning trace is going to fail in mid way. The approach saves up to 70% of token usage, and it even helps with increasing accuracy by 2%-3%.

Martina Vilas @martinagvilas.bsky.social · 14d

Can we predict which reasoning paths will succeed before seeing the answer? 🤔

Our new paper (arxiv.org/abs/2510.10494) proposes latent-trajectory signals from LLMs' hidden states to identify high-quality reasoning, cutting inference costs by up to 70% while maintaining accuracy

Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning

Reasoning models improve their problem-solving ability through inference-time scaling, allocating more compute via longer token budgets. Identifying which reasoning traces are likely to succeed remain...

arxiv.org

1 2

Martina Vilas @martinagvilas.bsky.social · 14d

Working on this project was a great experience during my internship at @msftresearch.bsky.social 💙

Learned so much from this amazing team! Huge thanks to my coauthors: @vidhishab.bsky.social, Safoora Yousefi, @besmiranushi.bsky.social, @erichorvitz.bsky.social

Martina Vilas @martinagvilas.bsky.social · 14d

We also found that these signals emerge EARLY in reasoning! At just 4k tokens, we can predict solution quality with ROC-AUC > 0.6.

This enables early path selection during parallel generation and ~60% token savings with +2.1% accuracy gains 🚀

1

Martina Vilas @martinagvilas.bsky.social · 14d

Using LT signals for answer selection in multi-sample inference leads to:

⚡ 48% average token reduction (up to 70%!)
📈 +2.6% accuracy improvement over majority voting
🎯 Works by identifying correct paths even when the majority is wrong

1

Martina Vilas @martinagvilas.bsky.social · 14d

Hidden states have distinctive temporal patterns for correct paths. They show:

✴️ Larger overall representational change (Net ↑)
✴️ Less wandering in latent space (Cumulative ↓)
✴️ More direct progress toward final state (Aligned ↑)

1

Martina Vilas @martinagvilas.bsky.social · 14d

Across 3 reasoning models (DeepSeek-R1, Phi-4-Reasoning-Plus, Qwen3) and diverse domains (GPQA, AIME, TSP), LT signals:

✅ Significantly predict correctness
✅ Outperform output-based confidence measures and cross-layer signals

1

Martina Vilas @martinagvilas.bsky.social · 14d

We track how representations evolve through the trace and extract 3 complementary signals:

📊 Net Change: Overall shift (start → end)
🔄 Cumulative Change: Total movement
🎯 Aligned Change: Progress toward final state

1

Martina Vilas @martinagvilas.bsky.social · 14d

Identifying trace quality is critical: it enables more reliable predictions, improves efficiency by avoiding wasted compute, and can be used to guide models toward productive reasoning strategies.

Our solution: Look inside the temporal evolution of the model's latent space! 🔍

1

Martina Vilas @martinagvilas.bsky.social · 14d

But not all reasoning traces are equal ⚖️ → some contain productive steps that lead to correct solutions ✅, while others deviate into overthinking, fail to converge, or exhibit inconsistent reasoning patterns ❌

1

Martina Vilas @martinagvilas.bsky.social · 14d

Modern LLMs use chain-of-thought reasoning to solve complex problems, generating step-by-step solutions that can span thousands of tokens.

📈Scaling this inference-time compute (longer traces, multiple samples) significantly improves performance across reasoning tasks.

1

Martina Vilas @martinagvilas.bsky.social · 14d

Can we predict which reasoning paths will succeed before seeing the answer? 🤔

Our new paper (arxiv.org/abs/2510.10494) proposes latent-trajectory signals from LLMs' hidden states to identify high-quality reasoning, cutting inference costs by up to 70% while maintaining accuracy

Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning

Reasoning models improve their problem-solving ability through inference-time scaling, allocating more compute via longer token budgets. Identifying which reasoning traces are likely to succeed remain...

arxiv.org

1 1 8

Reposted by Martina Vilas

Besmira Nushi @besmiranushi.bsky.social · Apr 29

All Eureka inference-time scaling insights are now available here: www.microsoft.com/en-us/resear... It was fun sharing these and more together with Vidhisha Balachandran @vidhishab.bsky.social and Vibhav Vineet at #ICLR2025.

Eureka Inference-Time Scaling Insights: Where We Stand and What Lies Ahead - Microsoft Research

Understanding and measuring the potential of inference-time scaling for reasoning. The new Eureka study tests nine state-of-the-art models on eight diverse reasoning tasks.

www.microsoft.com

2 3

Martina Vilas @martinagvilas.bsky.social · Apr 18

Looking forward to presenting this work next week at #ICLR2025! DM me if you are attending and want to grab a coffee to discuss these topics 💫

Federico Adolfi @fedeadolfi.bsky.social · Apr 18

I will be presenting this ✨ spotlight 💫 paper at #ICLR2025 with @martinagvilas.bsky.social. Come say hi if you're interested in DNN circuits, complexity and #interpretability

📆 Poster Session 4 (#530)
🕰️ Fri 25 Apr. 3:00-5:30 PM
📝 openreview.net/forum?id=Qog...
📊 iclr.cc/virtual/2025...

Paper title: the computational complexity of circuit discovery for inner interpretability.
Authors: Federico Adolfi, Martina Vilas, Todd Wareham

4 20

Martina Vilas @martinagvilas.bsky.social · Dec 2

December 5th our ML theory group at Cohere For AI is hosting @mathildepapillon.bsky.social to discuss their recent review arxiv.org/abs/2407.09468 on geometric/topological/algebraic ML.

Join us online 💫

1 13

Reposted by Martina Vilas

Yu Lu Liu @liuyulu.bsky.social · Nov 21

I’m putting together a starter pack for researchers working on human-centered AI evaluation. Reply or DM me if you’d like to be added, or if you have suggestions! Thank you!

(It looks NLP-centric at the moment, but that’s due to the current limits of my own knowledge 🙈)

go.bsky.app/G3w9LpE

15 10 36

Reposted by Martina Vilas

Julian Skirzynski @jskirzynski.bsky.social · Nov 23

I tried to find everyone who works in the area but I certainly missed some folks so please lmk...
go.bsky.app/BYkRryU

32 18 53

Reposted by Martina Vilas

Serge Belongie @serge.belongie.com · Nov 22

Does anyone know of any feeds (or similar) for student internship opportunities in ML/CV/NLP?

2 11 44

Reposted by Martina Vilas

Adhiraj Ghosh@ACL2025 @adhirajghosh.bsky.social · Nov 19

I've found starter packs on NLP, vision, graphics, etc. But personally, I would love to know and hear from researchers working on vision-language. So, let me know if you'd like to join this starter pack, would be happy to add!

go.bsky.app/TENRRBb

42 13 55

Reposted by Martina Vilas

Laura @lauraruis.bsky.social · Nov 20

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this:

Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢

🧵⬇️

36 140 860

Reposted by Martina Vilas

Sung Kim @sungkim.bsky.social · Nov 18

LLMs tend to match problem-solving strategies based on textual similarity rather than truly understanding the underlying principles of mathematical problems.

Paper: Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology

7 47

Reposted by Martina Vilas

Federico Adolfi @fedeadolfi.bsky.social · Nov 14

A starter pack of people working on interpretability / explainability of all kinds, using theoretical and/or empirical approaches.

Reply or DM if you want to be added, and help me reach others!

go.bsky.app/DZv6TSS

34 26 80

Reposted by Martina Vilas

Sweta Karlekar @swetakar.bsky.social · Nov 19

If you’re interested in mechanistic interpretability, I just found this starter pack and wanted to boost it (thanks for creating it @butanium.bsky.social !). Excited to have a mech interp community on bluesky 🎉

go.bsky.app/LisK3CP

3 8 36

Martina Vilas @martinagvilas.bsky.social · Nov 19

👋 I also work on the field (examples on my profile). Would love to be added!

1

Reposted by Martina Vilas

Christian Wolf @chriswolfvision.bsky.social · Nov 18

I forgot from whom in my feed I got this from, but anyway, this network analyzer is crazy efficient. It gives you ideas for accounts to follow based on your own followees. I just added 50 accounts or so.

bsky-follow-finder.theo.io

Bluesky Network Analyzer

Find accounts that you don't follow (yet) but are followed by lots of accounts that you do follow.

bsky-follow-finder.theo.io

9 24 82

Reposted by Martina Vilas

Yoav Goldberg @yoavgo.bsky.social · Nov 16

there are many smart speakers and thinkers around AI/ML and/or NLP. but i find almost everything to be kinda predictable by now, minor stylistic variations on the same story. who are some *interesting* speakers i should listen/read? i want things that may surprise or inspire me.

12 12 96