Lightnews — Scholar-powered news

Imanol Miranda

@imirandam.bsky.social

PhD student at HiTZ Zentroa (@hitz-zentroa.bsky.social) / IXA Group and the University of Basque Country (@upvehu.bsky.social).

Posts Replies Media Videos

Imanol Miranda

@imirandam.bsky.social

Why are image crops crucial? 🤔 We found that simply adding text segments isn't enough. The biggest performance gains come when text segments are paired with image crops, proving the power of serial image computing.

June 18, 2025 at 11:28 AM

Imanol Miranda

@imirandam.bsky.social

We've evaluated it across three diverse datasets: BiVLC, Winoground (171 instances), and BiSCoR-Ctrl. See the significant improvements by inference-time approach (ITA) on three existing models:

June 18, 2025 at 11:28 AM

Imanol Miranda

@imirandam.bsky.social

Our approach is straightforward yet effective:
1. Divide the image into smaller crops.
2. Extract text segments capturing objects, attributes and relations.
3. Use the VLM to find image crops that best fit the text segments.
4. Aggregate matching similarities for the final score.

June 18, 2025 at 11:28 AM

Imanol Miranda

@imirandam.bsky.social

#newHitzPaper
Can a simple inference-time approach unlock better Vision-Language Compositionality?🤯
Our latest paper shows how adding structure at inference significantly boosts performance in popular dual-encoder VLMs on different datasets.

Read more: arxiv.org/abs/2506.09691

June 18, 2025 at 11:28 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news