We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data
(TLDR: we cheat and get good scores)
@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social
- Using Infinigram, we uncovered substantial test-set leakage in commonly used datasets (e.g., leaked MMLU questions rising from ~1% to 24% from OLMo-1 to OLMo-2).
- Neural filtering can unintentionally favour leaked samples, further amplifying the effect.
- Using Infinigram, we uncovered substantial test-set leakage in commonly used datasets (e.g., leaked MMLU questions rising from ~1% to 24% from OLMo-1 to OLMo-2).
- Neural filtering can unintentionally favour leaked samples, further amplifying the effect.
- Our 24B base model stands out: it outperforms open counterparts in generic generation tasks in both French and English.
- However, benchmark scores initially lagged, prompting us to investigate why some datasets seem to boost benchmarks without improving real-world generation.
- Our 24B base model stands out: it outperforms open counterparts in generic generation tasks in both French and English.
- However, benchmark scores initially lagged, prompting us to investigate why some datasets seem to boost benchmarks without improving real-world generation.
Rachel Bawden, Benoît Sagot
(WMT test suite shared task)
Rachel Bawden, Benoît Sagot
(WMT test suite shared task)
Malik Marmonier, Benoît Sagot, Rachel Bawden
📅 Sunday, Nov 9 | 11:00–12:00 | WMT Poster (in person)
Malik Marmonier, Benoît Sagot, Rachel Bawden
📅 Sunday, Nov 9 | 11:00–12:00 | WMT Poster (in person)
Ziqian Peng, Rachel Bawden, François Yvon
📅 Sunday, Nov 9 | 14:00–17:00 | WMT Poster (in person)
Ziqian Peng, Rachel Bawden, François Yvon
📅 Sunday, Nov 9 | 14:00–17:00 | WMT Poster (in person)
Aina Garí Soler, Matthieu Labeau, Chloé Clavel
📅 Sunday, Nov 9 | 14:00–15:30 | *SEM Poster (in person)
Aina Garí Soler, Matthieu Labeau, Chloé Clavel
📅 Sunday, Nov 9 | 14:00–15:30 | *SEM Poster (in person)
Findings of the WMT25 General Machine Translation Shared Task: Time to Stop Evaluating on Easy Test Sets
Tom Kocmi et al. (incl. Rachel Bawden)
📅 Saturday, Nov 8 | 9:10–9:40 | WMT Oral (in person)
Findings of the WMT25 General Machine Translation Shared Task: Time to Stop Evaluating on Easy Test Sets
Tom Kocmi et al. (incl. Rachel Bawden)
📅 Saturday, Nov 8 | 9:10–9:40 | WMT Oral (in person)
Jesujoba Oluwadara Alabi et al. (incl. Rachel Bawden)
📅 Friday, Nov 7 | 14:00–15:30 | Main Conference Poster (in person)
Jesujoba Oluwadara Alabi et al. (incl. Rachel Bawden)
📅 Friday, Nov 7 | 14:00–15:30 | Main Conference Poster (in person)
Anh Ngo, Nicolas Rollet, Catherine Pelachaud, Chloé Clavel
📅 Friday, Nov 7 | 14:00–15:30 | Main Conference Oral (Discourse, Pragmatics, and Reasoning 2)
Anh Ngo, Nicolas Rollet, Catherine Pelachaud, Chloé Clavel
📅 Friday, Nov 7 | 14:00–15:30 | Main Conference Oral (Discourse, Pragmatics, and Reasoning 2)
Armel Zebaze, Benoît Sagot, Rachel Bawden
📅 Friday, Nov 7 | 12:30–13:30 | Findings Poster (remote)
Armel Zebaze, Benoît Sagot, Rachel Bawden
📅 Friday, Nov 7 | 12:30–13:30 | Findings Poster (remote)
Rasul Dent, Pedro Ortiz Suarez, Thibault Clérice, Benoît Sagot
📅 Friday, Nov 7 | 12:30–13:30 | Findings Poster
Rasul Dent, Pedro Ortiz Suarez, Thibault Clérice, Benoît Sagot
📅 Friday, Nov 7 | 12:30–13:30 | Findings Poster
Aina Garí Soler, Matthieu Labeau, Chloé Clavel
📅 Fri, Nov 7 | 12:30–13:30 | Findings Poster (in person)
Aina Garí Soler, Matthieu Labeau, Chloé Clavel
📅 Fri, Nov 7 | 12:30–13:30 | Findings Poster (in person)