Valentin Liévin
valentinlievin.bsky.social
Valentin Liévin
@valentinlievin.bsky.social
ML research at Google DeepMind
Better LLMs for healthcare and science.
Thank you @mike-shake.bsky.social, Anil Palepu, Tao Tu, Alan Kharti, Adam Rodman, Vivek Natarajan, Wei-Hung, and many others!
March 6, 2025 at 7:10 PM
[7/n] This is a major update for AMIE. While further research is needed, our study hints that AI could become a powerful tool for improving clinical decisions and healthcare access! Congratulations to the team for this incredible piece of work!
March 6, 2025 at 7:10 PM
[6/n] We also challenged AMIE with RxQA, a new medication reasoning benchmark requiring precise pharmacological knowledge. Again, AMIE outperformed PCPs.
March 6, 2025 at 7:10 PM
[5/n] We assessed AMIE’s clinical skills via a multi-visit Objective Structured Clinical Evaluation (OSCE), a common tool to evaluate medical professionals. AMIE scored (slightly) higher than human doctors (PCPs) in both management quality and alignment with guidelines!
March 6, 2025 at 7:10 PM
[4/n] In management reasoning, there is generally not a single ground truth, but rather a space of acceptable solutions. Capturing a reliable signal to gauge plan quality was a major challenge. Gemini-as-a-judge provided a robust signal, which we hill-climbed to improve our agent
March 6, 2025 at 7:10 PM
[3/n] Management reasoning is hard. AMIE can in-context retrieve and reason about specific recommendations from the guidelines, and compile these findings into a personalized plan with citations. Structured generation was a key ingredient to elicit long and controllable reasoning
March 6, 2025 at 7:10 PM
[2/n] We upgraded AMIE’s original dialogue agent with a reasoning partner. The new reasoning agent taps into Gemini’s long-context reasoning and retrieval capabilities to process 100+ pages of evidence-based clinical guidelines from @NICEComms and @BMJBestPractice, in real time!
March 6, 2025 at 7:10 PM