Aarash Feizi
aarashfeizi.bsky.social
Aarash Feizi
@aarashfeizi.bsky.social
Visiting Researcher at @ServiceNowRSRCH | PhD student in @mcgillu and @Mila_Quebec | Prev. @RecursionPharma

https://aarashfeizi.github.io/
🧵 5/7

✅ PairBench correlates strongly with existing benchmarks, meaning it can serve as a low-cost alternative to expensive human-annotated benchmarks!

This makes it easier to compare and rank models efficiently—without excessive computational costs.
February 27, 2025 at 7:54 PM
🚨 Excited to introduce PairBench! 🚨

💡 TL;DR: VLM-judges can fail at data comparison!

✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation.

📄 Paper: arxiv.org/abs/2502.15210

🧵 Thread: 👇
February 27, 2025 at 7:50 PM