Tobias Weyand
tobw.net
Tobias Weyand
@tobw.net
Researcher at Google DeepMind working towards human-level video understanding

🔗 tobw.net
The newly released Gemini 2.5 Pro (Preview 05/06) sets the state-of-the art on Minerva with 63.5% accuracy. Human accuracy is 92.5%.

developers.googleblog.com/en/gemini-2-...
Advancing the frontier of video understanding with Gemini 2.5- Google Developers Blog
Explore Gemini 2.5, enhancing video understanding and combining audio-visual data and code for new interactive applications
developers.googleblog.com
May 13, 2025 at 12:06 AM
📜 Paper: arxiv.org/abs/2505.006...
📊 Dataset: github.com/google-deepm...

This is work with my amazing colleagues and collaborators Arsha Nagrani, Sachit Menon, Ahmet Iscen, Shyamal Buch, Ramin Mehran, Nilpa Jha, Anja Hauth, Yukun Zhu, Carl Vondrick, Mikhail Sirotenko, and Cordelia Schmid
MINERVA: Evaluating Complex Video Reasoning
Multimodal LLMs are turning their focus to video benchmarks, however most video benchmarks only provide outcome supervision, with no intermediate or interpretable reasoning steps. This makes it challe...
arxiv.org
May 13, 2025 at 12:06 AM
And the ICLR decisions
January 22, 2025 at 9:51 PM
Whoa, massive news! Excited for you and looking forward to seeing what you'll build there!
December 5, 2024 at 7:08 AM
Another nice way to get an ETA is

import tqm
for i in tqdm(range(len(dataset)):
...
November 28, 2024 at 12:22 AM
Professor knocks - "Hey, I have a 'research' project for you"
November 27, 2024 at 3:19 AM
Thanks, looks promising!
November 27, 2024 at 3:15 AM
Oh nice, seems to work for the first few papers I tried. Thank you!
November 27, 2024 at 3:14 AM