Open: spatial reasoning, data-efficiency, learning compatible representations.
Open: spatial reasoning, data-efficiency, learning compatible representations.
However, every few years we rediscover the lesson that on difficult tasks, VLMs silently regress to being nearly blind.
x.com/DhruvBatra_/...
However, every few years we rediscover the lesson that on difficult tasks, VLMs silently regress to being nearly blind.
x.com/DhruvBatra_/...
It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.
When we started, the idea of answering any question about any image seemed outlandish.
It's extra special because ICCV25 marks the 10-year anniversary of the VQA paper.
When we started, the idea of answering any question about any image seemed outlandish.
Today, we’re telling our story — show before you talk!
𝘞𝘦 𝘢𝘳𝘦 𝘳𝘦-𝘪𝘮𝘢𝘨𝘪𝘯𝘪𝘯𝘨 𝘩𝘰𝘸 𝘱𝘦𝘰𝘱𝘭𝘦 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘸𝘦𝘣 — one of humanity’s greatest inventions and a a mess overdue for an overhaul.
yutori.com
Today, we’re telling our story — show before you talk!
𝘞𝘦 𝘢𝘳𝘦 𝘳𝘦-𝘪𝘮𝘢𝘨𝘪𝘯𝘪𝘯𝘨 𝘩𝘰𝘸 𝘱𝘦𝘰𝘱𝘭𝘦 𝘪𝘯𝘵𝘦𝘳𝘢𝘤𝘵 𝘸𝘪𝘵𝘩 𝘵𝘩𝘦 𝘸𝘦𝘣 — one of humanity’s greatest inventions and a a mess overdue for an overhaul.
yutori.com
Why? Whom does this possibly harm?
Why? Whom does this possibly harm?
Brilliant talk by Ilya, but he's wrong on one point.
We are NOT running out of data. We are running out of human-written text.
We have more videos than we know what to do with. We just haven't solved pre-training in vision.
Just go out and sense the world. Data is easy.
Brilliant talk by Ilya, but he's wrong on one point.
We are NOT running out of data. We are running out of human-written text.
We have more videos than we know what to do with. We just haven't solved pre-training in vision.
Just go out and sense the world. Data is easy.