🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.
Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet Üstün, Nigel Collier.
📜 arxiv.org/abs/2502.19158
If you’re attending the conference, don’t miss the chance to explore our work and connect with our team.
If you’re attending the conference, don’t miss the chance to explore our work and connect with our team.
Cohere Labs is excited to announce Connect - a 3-day virtual conference celebrating the power of collaboration in open science!
Cohere Labs is excited to announce Connect - a 3-day virtual conference celebrating the power of collaboration in open science!
What if we optimized prompts instead of completions?
That’s the focus of our most recent work on prompt space optimization for multilingual synthetic data🗣️
What if we optimized prompts instead of completions?
That’s the focus of our most recent work on prompt space optimization for multilingual synthetic data🗣️
Developed by @cohereforai.bsky.social, it spans 16 languages with both Culturally Sensitive & Agnostic samples - helping researchers uncover cultural & linguistic biases in multilingual evaluation.
Developed by @cohereforai.bsky.social, it spans 16 languages with both Culturally Sensitive & Agnostic samples - helping researchers uncover cultural & linguistic biases in multilingual evaluation.
Check out the leaderboard and notebook linked below.
Check out the leaderboard and notebook linked below.
Joelle Pineau, @cohere.com's new Chief AI Officer.
We look forward to working together on frontier research - advancing the science of building models that are robust, capable, and impactful in the real world.
Joelle Pineau, @cohere.com's new Chief AI Officer.
We look forward to working together on frontier research - advancing the science of building models that are robust, capable, and impactful in the real world.
Can multilingual ability be boosted at post training?
Julia Kreutzer from @cohereforai.bsky.social explores RL, test-time scaling & data distillation to improve open-ended tasks across languages. 🌍✨
#MELTWorkshop2025
Can multilingual ability be boosted at post training?
Julia Kreutzer from @cohereforai.bsky.social explores RL, test-time scaling & data distillation to improve open-ended tasks across languages. 🌍✨
#MELTWorkshop2025
First, the Multilingual Data Quality Signals workshop, bringing together researchers across disciplines to discuss & present research on data quality signals in multilingual data.
First, the Multilingual Data Quality Signals workshop, bringing together researchers across disciplines to discuss & present research on data quality signals in multilingual data.
Come connect with paper authors @juliakreutzer.bsky.social and @kocmitom.bsky.social.
Come connect with paper authors @juliakreutzer.bsky.social and @kocmitom.bsky.social.
Check out our latest work that builds on this insight. 👇
Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
Check out our latest work that builds on this insight. 👇
Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
Introducing Fusion-of-N: a simple and powerful way to advance inference and distillation beyond Best-of-N.
And we’re hiring a Senior Research Scientist to co-create with us.
If you believe in research as a shared, global effort — this is your chance.
And we’re hiring a Senior Research Scientist to co-create with us.
If you believe in research as a shared, global effort — this is your chance.
In our latest work we uncover the Verification Ceiling Problem: strict “all tests must pass” rules throw away useful data, while weak tests let errors through.
In our latest work we uncover the Verification Ceiling Problem: strict “all tests must pass” rules throw away useful data, while weak tests let errors through.
Thanks to @cohereforai.bsky.social for the copies and pizza.
Thanks to @cohereforai.bsky.social for the copies and pizza.
If you’re passionate about studying fundamental AI problems and working in a globally collaborative, open-science environment, this is for you.
Apply here: jobs.ashbyhq.com/cohere/7ec9e...
If you’re passionate about studying fundamental AI problems and working in a globally collaborative, open-science environment, this is for you.
Apply here: jobs.ashbyhq.com/cohere/7ec9e...
It’s easily one of funnest paper reads in the city!
It’s easily one of funnest paper reads in the city!
Entry points matter.
We started the Scholars Program 3 years ago to give new researchers a real shot — excited to open applications for year 4✨
This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.
Apply by Aug 29.
Entry points matter.
We started the Scholars Program 3 years ago to give new researchers a real shot — excited to open applications for year 4✨
New post in collaboration with AI Singapore explores why Elo falls short for AI leaderboards and how we can do better.
New post in collaboration with AI Singapore explores why Elo falls short for AI leaderboards and how we can do better.
This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.
Apply by Aug 29.
This is your chance to collaborate with some of the brightest minds in AI & chart new courses in ML research. Let's change the spaces breakthroughs happen.
Apply by Aug 29.
Our latest work introduces a new inference time scaling recipe that is sample-efficient, multilingual, and suitable for multi-task requirements. 🍋
Our latest work introduces a new inference time scaling recipe that is sample-efficient, multilingual, and suitable for multi-task requirements. 🍋
📏 Our comprehensive survey reveals that there is still a long way to go.
📏 Our comprehensive survey reveals that there is still a long way to go.