Vidhisha Balachandran
vidhishab.bsky.social
Vidhisha Balachandran
@vidhishab.bsky.social
AI Evaluation and Interpretability @MicrosoftResearch, Prev PhD @CMU.
All our evaluation logs and reasoning traces for open source models are now released! We hope this can be useful for the community for further research and analysis!
📌You can now find all the evaluation logs from our inference-time scaling report and the Phi-4 reasoning technical report at huggingface.co/datasets/mic.... The evaluation code for the reasoning benchmarks can also be found in the main branch of Eureka ML Insights at github.com/microsoft/eu....
microsoft/Eureka-Bench-Logs · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
May 27, 2025 at 7:20 PM
Reposted by Vidhisha Balachandran
All Eureka inference-time scaling insights are now available here: www.microsoft.com/en-us/resear... It was fun sharing these and more together with Vidhisha Balachandran @vidhishab.bsky.social and Vibhav Vineet at #ICLR2025.
Eureka Inference-Time Scaling Insights: Where We Stand and What Lies Ahead - Microsoft Research
Understanding and measuring the potential of inference-time scaling for reasoning. The new Eureka study tests nine state-of-the-art models on eight diverse reasoning tasks.
www.microsoft.com
April 29, 2025 at 3:36 PM
Reposted by Vidhisha Balachandran
🎉The Phi-4 reasoning models have landed on HF and Azure AI Foundry. The new models are competitive and often outperform much larger frontier models. It is exciting to see the reasoning capabilities extend to more domains beyond math, including algorithmic reasoning, calendar planning, and coding.
May 1, 2025 at 12:50 AM
Reposted by Vidhisha Balachandran
Come see us in any of the following sessions on model understanding and evaluation! 🔬 #ICLR2025 @msftresearch.bsky.social
April 24, 2025 at 1:38 AM
Reposted by Vidhisha Balachandran
Our paper "Improving Instruction-Following in Language Models through Activation Steering” has been accepted to #ICLR2025!

We're also excited to share that our public GitHub repo is now live.
Code: github.com/microsoft/ll...
Camera-ready: arxiv.org/abs/2410.12877
April 15, 2025 at 4:35 PM
🚀 Excited to share our new Eureka report!

We studied inference-time scaling across 9 models (conventional & reasoning) on 8 tough tasks—from math & STEM reasoning to navigation, calendar planning, NP-hard problems & spatial planning.

Full Report: aka.ms/eureka-ml-in...
April 10, 2025 at 8:46 PM
Reposted by Vidhisha Balachandran
Asking the right questions can make or break decisions in fields like medicine, law, and beyond✴️
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵
February 21, 2025 at 4:00 PM
Reposted by Vidhisha Balachandran
Effective decision-making starts with asking the right questions. Our new framework, ALFA, teaches LLMs to ask questions through fine-grained attributes in expert domains.

Excited to see where this takes the next generation of effective LLM assistants and agents!
Asking the right questions can make or break decisions in fields like medicine, law, and beyond✴️
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵
February 24, 2025 at 10:26 PM
Excited to share our December updates on the state of progress in AI ! @msftresearch.bsky.social

Detailed report coming early next year ✨
💡Eureka ML Insights 1/N December @neuripsconf.bsky.social: In contrary to our September findings, the gap between the worst and best observed performance across different capabilities and across 12 sota models has widened.
December 15, 2024 at 5:34 AM
Reposted by Vidhisha Balachandran
Stoked to share our new work on scaling training data attribution (TDA) toward LLM pretraining - and great insights we found along the way!

medium.com/people-ai-re... and more in the thread below from most excellent student researcher @tylerachang.bsky.social
December 14, 2024 at 6:16 PM
Reposted by Vidhisha Balachandran
Are you ready for an early Christmas present from our team at Microsoft Research?

Introducing the most powerful smol model ever built in the world!

Welcome to Phi-4! 👇
December 13, 2024 at 3:37 AM
Reposted by Vidhisha Balachandran
The phi-4 technical report is now available on arxiv arxiv.org/abs/2412.08905 and on Azure AI. Congratulations to the phi team on the release and the major milestone on scaling data quality processes! 🎉 @msftresearch.bsky.social @sbubeck.bsky.social @suriyag.bsky.social @sytelus.bsky.social
December 13, 2024 at 3:17 PM
Come talk to us about model evaluation! 4:30 pm today at West Meeting Room 301

Also to see @besmiranushi.bsky.social ‘s cool demos 🍁
59 seconds on how to run a Eureka evaluation job. Raw human video editing capabilities 😆. Come see us at West Meeting Room 301 #NeurIPS2024, getting started at 4:30 pm.
December 12, 2024 at 12:08 AM
We will be presenting Eureka - our model evaluation framework and sharing in-depth insights at NeurIPS this week! Come join us on Wednesday (Dec 11) 4:30pm at West Meeting Room 301 to hear what we’ve been upto the past few months! :)

neurips.cc/Expo/Confere...

microsoft.github.io/eureka-ml-in...
@vidhishab.bsky.social and I will be presenting Eureka during NeurIPS Expo, on Wednesday, December 11 at 4:30 PM (West Meeting Room 301). Join us to get a glimpse of a demo, recent results, and an overall in-depth comparison of 12 frontier foundation models.
December 9, 2024 at 11:11 PM
Reposted by Vidhisha Balachandran
🚨 NeurIPS 2024 Spotlight
Did you know we lack standards for AI benchmarks, despite their role in tracking progress, comparing models, and shaping policy? 🤯 Enter BetterBench–our framework with 46 criteria to assess benchmark quality: betterbench.stanford.edu 1/x
November 25, 2024 at 7:02 PM
Reposted by Vidhisha Balachandran
Today is the International Day for the Elimination of Violence against Women. According to the UN, more than 50 000 women were killed by a partner or family member in 2023 news.un.org/en/story/202... This number is an underestimate given that only 37 countries reported in 2023.
November 26, 2024 at 6:16 AM