Lightnews — Scholar-powered news

@vidhishab.bsky.social

All our evaluation logs and reasoning traces for open source models are now released! We hope this can be useful for the community for further research and analysis!

Besmira Nushi @besmiranushi.bsky.social · May 27

📌You can now find all the evaluation logs from our inference-time scaling report and the Phi-4 reasoning technical report at huggingface.co/datasets/mic.... The evaluation code for the reasoning benchmarks can also be found in the main branch of Eureka ML Insights at github.com/microsoft/eu....

microsoft/Eureka-Bench-Logs · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

May 27, 2025 at 7:20 PM

Reposted by Vidhisha Balachandran

Besmira Nushi

@besmiranushi.bsky.social

All Eureka inference-time scaling insights are now available here: www.microsoft.com/en-us/resear... It was fun sharing these and more together with Vidhisha Balachandran @vidhishab.bsky.social and Vibhav Vineet at #ICLR2025.

Eureka Inference-Time Scaling Insights: Where We Stand and What Lies Ahead - Microsoft Research

Understanding and measuring the potential of inference-time scaling for reasoning. The new Eureka study tests nine state-of-the-art models on eight diverse reasoning tasks.

www.microsoft.com

April 29, 2025 at 3:36 PM

Reposted by Vidhisha Balachandran

Besmira Nushi

@besmiranushi.bsky.social

🎉The Phi-4 reasoning models have landed on HF and Azure AI Foundry. The new models are competitive and often outperform much larger frontier models. It is exciting to see the reasoning capabilities extend to more domains beyond math, including algorithmic reasoning, calendar planning, and coding.

May 1, 2025 at 12:50 AM

Reposted by Vidhisha Balachandran

Besmira Nushi

@besmiranushi.bsky.social

Come see us in any of the following sessions on model understanding and evaluation! 🔬 #ICLR2025 @msftresearch.bsky.social

April 24, 2025 at 1:38 AM

Reposted by Vidhisha Balachandran

Alessandro Stolfo

@alestolfo.bsky.social

Our paper "Improving Instruction-Following in Language Models through Activation Steering” has been accepted to #ICLR2025!

We're also excited to share that our public GitHub repo is now live.
Code: github.com/microsoft/ll...
Camera-ready: arxiv.org/abs/2410.12877

April 15, 2025 at 4:35 PM

Vidhisha Balachandran

@vidhishab.bsky.social

🚀 Excited to share our new Eureka report!

We studied inference-time scaling across 9 models (conventional & reasoning) on 8 tough tasks—from math & STEM reasoning to navigation, calendar planning, NP-hard problems & spatial planning.

Full Report: aka.ms/eureka-ml-in...

April 10, 2025 at 8:46 PM

Reposted by Vidhisha Balachandran

Stella Li

@stellali.bsky.social

Asking the right questions can make or break decisions in fields like medicine, law, and beyond✴️
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵

February 21, 2025 at 4:00 PM

Reposted by Vidhisha Balachandran

Tsvetshop NLP

@tsvetshop.bsky.social

Effective decision-making starts with asking the right questions. Our new framework, ALFA, teaches LLMs to ask questions through fine-grained attributes in expert domains.

Excited to see where this takes the next generation of effective LLM assistants and agents!

Stella Li @stellali.bsky.social · Feb 21

Asking the right questions can make or break decisions in fields like medicine, law, and beyond✴️
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵

February 24, 2025 at 10:26 PM

Vidhisha Balachandran

@vidhishab.bsky.social

Excited to share our December updates on the state of progress in AI ! @msftresearch.bsky.social

Detailed report coming early next year ✨

Besmira Nushi @besmiranushi.bsky.social · Dec 13

💡Eureka ML Insights 1/N December @neuripsconf.bsky.social: In contrary to our September findings, the gap between the worst and best observed performance across different capabilities and across 12 sota models has widened.

December 15, 2024 at 5:34 AM

Reposted by Vidhisha Balachandran

dheerajr.bsky.social

@dheerajr.bsky.social

Stoked to share our new work on scaling training data attribution (TDA) toward LLM pretraining - and great insights we found along the way!

medium.com/people-ai-re... and more in the thread below from most excellent student researcher @tylerachang.bsky.social

December 14, 2024 at 6:16 PM

Reposted by Vidhisha Balachandran

Shital Shah

@sytelus.bsky.social

Are you ready for an early Christmas present from our team at Microsoft Research?

Introducing the most powerful smol model ever built in the world!

Welcome to Phi-4! 👇

December 13, 2024 at 3:37 AM

Reposted by Vidhisha Balachandran

Besmira Nushi

@besmiranushi.bsky.social

The phi-4 technical report is now available on arxiv arxiv.org/abs/2412.08905 and on Azure AI. Congratulations to the phi team on the release and the major milestone on scaling data quality processes! 🎉 @msftresearch.bsky.social @sbubeck.bsky.social @suriyag.bsky.social @sytelus.bsky.social

Sebastien Bubeck @sbubeck.bsky.social · Dec 13

arxiv.org/abs/2412.08905 Hope you like it!

Phi-4 Technical Report

We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily o...

arxiv.org

December 13, 2024 at 3:17 PM

Vidhisha Balachandran

@vidhishab.bsky.social

Come talk to us about model evaluation! 4:30 pm today at West Meeting Room 301

Also to see @besmiranushi.bsky.social ‘s cool demos 🍁

Besmira Nushi @besmiranushi.bsky.social · Dec 12

59 seconds on how to run a Eureka evaluation job. Raw human video editing capabilities 😆. Come see us at West Meeting Room 301 #NeurIPS2024, getting started at 4:30 pm.

December 12, 2024 at 12:08 AM

Vidhisha Balachandran

@vidhishab.bsky.social

We will be presenting Eureka - our model evaluation framework and sharing in-depth insights at NeurIPS this week! Come join us on Wednesday (Dec 11) 4:30pm at West Meeting Room 301 to hear what we’ve been upto the past few months! :)

neurips.cc/Expo/Confere...

microsoft.github.io/eureka-ml-in...

Besmira Nushi @besmiranushi.bsky.social · Dec 9

@vidhishab.bsky.social and I will be presenting Eureka during NeurIPS Expo, on Wednesday, December 11 at 4:30 PM (West Meeting Room 301). Join us to get a glimpse of a demo, recent results, and an overall in-depth comparison of 12 frontier foundation models.

December 9, 2024 at 11:11 PM

Reposted by Vidhisha Balachandran

Anka Reuel ➡️ NeurIPS

@ankareuel.bsky.social

🚨 NeurIPS 2024 Spotlight
Did you know we lack standards for AI benchmarks, despite their role in tracking progress, comparing models, and shaping policy? 🤯 Enter BetterBench–our framework with 46 criteria to assess benchmark quality: betterbench.stanford.edu 1/x

November 25, 2024 at 7:02 PM

Reposted by Vidhisha Balachandran

Besmira Nushi

@besmiranushi.bsky.social

Today is the International Day for the Elimination of Violence against Women. According to the UN, more than 50 000 women were killed by a partner or family member in 2023 news.un.org/en/story/202... This number is an underestimate given that only 37 countries reported in 2023.

November 26, 2024 at 6:16 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news