We're also excited to share that our public GitHub repo is now live.
Code: github.com/microsoft/ll...
Camera-ready: arxiv.org/abs/2410.12877
We're also excited to share that our public GitHub repo is now live.
Code: github.com/microsoft/ll...
Camera-ready: arxiv.org/abs/2410.12877
We studied inference-time scaling across 9 models (conventional & reasoning) on 8 tough tasks—from math & STEM reasoning to navigation, calendar planning, NP-hard problems & spatial planning.
Full Report: aka.ms/eureka-ml-in...
We studied inference-time scaling across 9 models (conventional & reasoning) on 8 tough tasks—from math & STEM reasoning to navigation, calendar planning, NP-hard problems & spatial planning.
Full Report: aka.ms/eureka-ml-in...
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵
Excited to see where this takes the next generation of effective LLM assistants and agents!
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVE seek information through better questions through **structured rewards**🏥❓
(co-led with @jiminmun.bsky.social)
👉🏻🧵
Excited to see where this takes the next generation of effective LLM assistants and agents!
Detailed report coming early next year ✨
Detailed report coming early next year ✨
medium.com/people-ai-re... and more in the thread below from most excellent student researcher @tylerachang.bsky.social
medium.com/people-ai-re... and more in the thread below from most excellent student researcher @tylerachang.bsky.social
Introducing the most powerful smol model ever built in the world!
Welcome to Phi-4! 👇
Introducing the most powerful smol model ever built in the world!
Welcome to Phi-4! 👇
Also to see @besmiranushi.bsky.social ‘s cool demos 🍁
Also to see @besmiranushi.bsky.social ‘s cool demos 🍁
neurips.cc/Expo/Confere...
microsoft.github.io/eureka-ml-in...
neurips.cc/Expo/Confere...
microsoft.github.io/eureka-ml-in...
Did you know we lack standards for AI benchmarks, despite their role in tracking progress, comparing models, and shaping policy? 🤯 Enter BetterBench–our framework with 46 criteria to assess benchmark quality: betterbench.stanford.edu 1/x
Did you know we lack standards for AI benchmarks, despite their role in tracking progress, comparing models, and shaping policy? 🤯 Enter BetterBench–our framework with 46 criteria to assess benchmark quality: betterbench.stanford.edu 1/x