Sara Beery - Assistant Professor at MIT CSAIL
@sarameghanbeery.bsky.social
Matej Kristan - Full professor at the University of Ljubljana
www.vicos.si/people/matej...
Mark Boss - Co-Head of 3D & Image at Stability AI
@markboss.bsky.social
Sara Beery - Assistant Professor at MIT CSAIL
@sarameghanbeery.bsky.social
Matej Kristan - Full professor at the University of Ljubljana
www.vicos.si/people/matej...
Mark Boss - Co-Head of 3D & Image at Stability AI
@markboss.bsky.social
– Unlike SigLIP models, PE shows a large gap between its image-to-image and text-to-image performance.
– Unlike SigLIP models, PE shows a large gap between its image-to-image and text-to-image performance.
– Gains +12% performance from adaptation.
– Outperforms SigLIP2 (previous best) by +2%.
– DINOv3 lags behind, gaining less than +2% from adaptation
– Gains +12% performance from adaptation.
– Outperforms SigLIP2 (previous best) by +2%.
– DINOv3 lags behind, gaining less than +2% from adaptation
• DINOv3 → sets a new state-of-the-art in image-to-image retrieval without linear adaptation
– The large variant outperforms all other models by a significant margin.
– The base outperforms the large variants of other model series.
• DINOv3 → sets a new state-of-the-art in image-to-image retrieval without linear adaptation
– The large variant outperforms all other models by a significant margin.
– The base outperforms the large variants of other model series.
- IL object classification, detectection, segmentation, and pose estimation
- particular object and event retrieval
- personalized image/video generation
- cross/multi-modal recognition at IL
- image matching, place recognition, video tracking
- other ILR+G applications
- ILR+G datasets
- IL object classification, detectection, segmentation, and pose estimation
- particular object and event retrieval
- personalized image/video generation
- cross/multi-modal recognition at IL
- image matching, place recognition, video tracking
- other ILR+G applications
- ILR+G datasets