Philipp Koehn
Philipp Koehn
@phikoehn.bsky.social
Computer Science at Johns Hopkins University
Center for Language and Speech Processing
Reposted by Philipp Koehn
📊 Preliminary ranking of WMT 2025 General Machine Translation benchmark is here!

But don't draw conclusions just yet - automatic metrics are biased for techniques like metric as a reward model or MBR. The official human ranking will be part of General MT findings at WMT.

arxiv.org/abs/2508.14909
Preliminary Ranking of WMT25 General Machine Translation Systems
We present the preliminary ranking of the WMT25 General Machine Translation Shared Task, in which MT systems have been evaluated using automatic metrics. As this ranking is based on automatic evaluati...
arxiv.org
August 23, 2025 at 9:28 AM