Lightnews — Scholar-powered news

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

210 followers 320 following 12 posts

PhD student in visual representation learning at Valeo.ai and Sorbonne Université (MLIA)

Posts Replies Media Videos

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

In our paper DIP, we use DiffCut to generate segmentation pseudo-labels - the masks are very high-fidelity, which greatly boosts supervision quality 👏

November 5, 2025 at 6:42 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

Work done in collaboration with
@spyrosgidaris.bsky.social‬ @vobeckya.bsky.social‬ @abursuc.bsky.social and Nicolas Thome

Paper: arxiv.org/abs/2506.18463
Github: github.com/sirkosophia...

GitHub - sirkosophia/DIP: Official implementation of DIP: Unsupervised Dense In-Context Post-training of Visual Representations

Official implementation of DIP: Unsupervised Dense In-Context Post-training of Visual Representations - sirkosophia/DIP

github.com

June 25, 2025 at 7:21 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

6/n Benefits 💪
- < 9h on a single A100 gpu.
- Improves across 6 segmentation benchmarks
- Boosts performance for in-context depth prediction.
- Plug-and-play for different ViTs: DINOv2, CLIP, MAE.
- Robust in low-shot and domain shift.

June 25, 2025 at 7:21 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

5/n Why is DIP unsupervised?

DIP doesn't require manually annotated segmentation masks for its post-training. To accomplish this, it leverages Stable Diffusion (via DiffCut) alongside DINOv2R features to automatically construct in-context pseudo-tasks for its post-training.

June 25, 2025 at 7:21 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

4/n Meet Dense In-context Post-training (DIP)! 🔄

- Meta-learning inspired: adopts episodic training principles
- Task-aligned: Explicitly mimics downstream dense in-context tasks during post-training.
- Purpose-built: Optimizes the model for dense in-context performance.

June 25, 2025 at 7:21 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

3/n Most unsupervised (post-)training methods for dense in-context scene understanding rely on self-distillation frameworks with (somewhat) complicated objectives and network components. Hard to interpret, tricky to tune.

Is there a simpler alternative? 👀

June 25, 2025 at 7:21 PM

Sophia Sirko-Galouchenko 🇺🇦

@ssirko.bsky.social

2/n What is dense in-context scene understanding?

Formulate dense prediction tasks as nearest-neighbor retrieval problems using patch feature similarities between query and the labeled prompt images (introduced in @ibalazevic.bsky.social‬ et al.’s HummingBird; figure below from their work).

June 25, 2025 at 7:21 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news