visual computing, 3D vision, spatial AI, machine learning, robot perception.
📍Zurich, Switzerland
www.youtube.com/watch?v=mayo...
www.youtube.com/watch?v=mayo...
I share the frustration. It's disempowering when most major progress recently is downstream of "foundation models" that you don't have the compute or data to train yourself.
I share the frustration. It's disempowering when most major progress recently is downstream of "foundation models" that you don't have the compute or data to train yourself.
x.com/chrisoffner3...
x.com/chrisoffner3...
SigLIP (VLMs) and DINO are two competing paradigms for image encoders.
My intuition is that joint vision-language modeling works great for semantic problems but may be too coarse for geometry problems like SfM or SLAM.
Most animals navigate 3D space perfectly without language.
SigLIP (VLMs) and DINO are two competing paradigms for image encoders.
My intuition is that joint vision-language modeling works great for semantic problems but may be too coarse for geometry problems like SfM or SLAM.
Most animals navigate 3D space perfectly without language.
www.theverge.com/news/669238/...
www.theverge.com/news/669238/...
(from www.reddit.com/r/ChatGPT/co...)
(from www.reddit.com/r/ChatGPT/co...)
techcrunch.com/2025/04/25/a...
techcrunch.com/2025/04/25/a...