Valentin Liévin
valentinlievin.bsky.social
Valentin Liévin
@valentinlievin.bsky.social
ML research at Google DeepMind
Better LLMs for healthcare and science.
[6/n] We also challenged AMIE with RxQA, a new medication reasoning benchmark requiring precise pharmacological knowledge. Again, AMIE outperformed PCPs.
March 6, 2025 at 7:10 PM
[5/n] We assessed AMIE’s clinical skills via a multi-visit Objective Structured Clinical Evaluation (OSCE), a common tool to evaluate medical professionals. AMIE scored (slightly) higher than human doctors (PCPs) in both management quality and alignment with guidelines!
March 6, 2025 at 7:10 PM
[4/n] In management reasoning, there is generally not a single ground truth, but rather a space of acceptable solutions. Capturing a reliable signal to gauge plan quality was a major challenge. Gemini-as-a-judge provided a robust signal, which we hill-climbed to improve our agent
March 6, 2025 at 7:10 PM
[3/n] Management reasoning is hard. AMIE can in-context retrieve and reason about specific recommendations from the guidelines, and compile these findings into a personalized plan with citations. Structured generation was a key ingredient to elicit long and controllable reasoning
March 6, 2025 at 7:10 PM
[2/n] We upgraded AMIE’s original dialogue agent with a reasoning partner. The new reasoning agent taps into Gemini’s long-context reasoning and retrieval capabilities to process 100+ pages of evidence-based clinical guidelines from @NICEComms and @BMJBestPractice, in real time!
March 6, 2025 at 7:10 PM
[1/n] Happy to share a big step forward for AMIE, the Articulate Medical Intelligence Explorer! We explored the frontier of AI for disease management. How well can AI manage patients over time?

🔗 research.google/blog/from-di...
March 6, 2025 at 7:10 PM