https://aarashfeizi.github.io/
✅ PairBench correlates strongly with existing benchmarks, meaning it can serve as a low-cost alternative to expensive human-annotated benchmarks!
This makes it easier to compare and rank models efficiently—without excessive computational costs.
✅ PairBench correlates strongly with existing benchmarks, meaning it can serve as a low-cost alternative to expensive human-annotated benchmarks!
This makes it easier to compare and rank models efficiently—without excessive computational costs.
💡 TL;DR: VLM-judges can fail at data comparison!
✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation.
📄 Paper: arxiv.org/abs/2502.15210
🧵 Thread: 👇
💡 TL;DR: VLM-judges can fail at data comparison!
✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation.
📄 Paper: arxiv.org/abs/2502.15210
🧵 Thread: 👇