Lightnews — Scholar-powered news

Cohere Labs

@cohereforai.bsky.social

⚖️ LLM-as-a-judge: mixed reliability.

Top systems reach ~95% pairwise accuracy open-ended and summarization tasks.
Smaller ones barely beat coin-flip territory at ~55%.

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

🤖Naturalness is still a significant challenge.

Across open-ended generation and cross lingual summarization, the biggest weakness isn’t coherence or accuracy, but it is sounding like a native speaker. Many outputs still feel robotic or translated.

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

🧠English isn’t always easiest.

Models like Gemini 2.5 Pro and Claude 4 sometimes did better in Korean, German, or Spanish than in English when solving reasoning tasks.

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

🧩Linguistic reasoning remains the toughest nut. 🥥

Even top models scored below 50% on linguistic reasoning tasks, showing that structured linguistic deduction is still an open challenge.

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

🧩 Linguistic reasoning on unseen languages
📝 Open-ended generation testing naturalness and usefulness
📘 Cross-lingual summarization
🔁 Machine translation
🧑‍⚖️ LLM-as-a-Judge evaluating outputs of other models

All backed by human evals and public releases of data + outputs!
github.com/wmt-conferen...

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

How well do LLMs handle multilinguality? 🌍🤖

🔬We brought the rigor from Machine Translation evaluation to multilingual LLM benchmarking and organized the WMT25 Multilingual Instruction Shared Task spanning 30 languages and 5 subtasks.

October 30, 2025 at 5:51 PM

Cohere Labs

@cohereforai.bsky.social

Cohere Labs x EMNLP 2025: "When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs"

Congrats to authors Ammar Khairi, Daniel D'souza, Ye Shen, @juliakreutzer.bsky.social, @sarahooker.bsky.social

📜 arxiv.org/abs/2506.20544

October 29, 2025 at 6:31 PM

Cohere Labs

@cohereforai.bsky.social

Cohere Labs x EMNLP 2025 "When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning"

Congrats to authors Yijiang River Dong, @tiancheng.bsky.social, Yinhong Liu, Ahmet Üstün, Nigel Collier.

📜 arxiv.org/abs/2502.19158

October 29, 2025 at 6:31 PM

Cohere Labs

@cohereforai.bsky.social

Cohere Labs x EMNLP 2025: "The State of Multilingual LLM Safety Research: From Measuring The Language Gap To Mitigating It"

Congrats to authors @yongzx.bsky.social , Beyza Ermis, @mziizm.bsky.social, Stephen Bach, @juliakreutzer.bsky.social.

📜 arxiv.org/abs/2505.24119

October 29, 2025 at 6:31 PM

Cohere Labs

@cohereforai.bsky.social

Cohere Labs x EMNLP 2025: "Nexus: Adaptive Upcycling to Efficiently Pretrain Mixture of Experts"

Congrats to authors Nikolas Gritsch, Qizhen Zhang, @acyrl.bsky.social, @sarahooker.bsky.social and Ahmet Üstün.

📜 arxiv.org/abs/2408.15901

October 29, 2025 at 6:31 PM

Cohere Labs

@cohereforai.bsky.social

We’re thrilled to announce that some of our research will be presented at @emnlpmeeting.bsky.social next week! 🥳

If you’re attending the conference, don’t miss the chance to explore our work and connect with our team.

October 29, 2025 at 6:31 PM

Cohere Labs

@cohereforai.bsky.social

Join us for inspiring keynotes, lightning talks, and interactive sessions that bring together curious minds from around the world. Throughout the conference, we’ll:

🔬 Showcase cutting-edge research
💡 Highlight meaningful collaborations
🤝 Inspire new partnerships

October 24, 2025 at 10:00 AM

Cohere Labs

@cohereforai.bsky.social

“Individually, we are one drop. Together, we are an ocean.” - Ryunosuke Satoro ✨

Cohere Labs is excited to announce Connect - a 3-day virtual conference celebrating the power of collaboration in open science!

October 24, 2025 at 10:00 AM

Cohere Labs

@cohereforai.bsky.social

We also evaluated our method on languages not seen during pre-training🌍: while performance is higher for seen languages, our transformations significantly improve both groups over the baseline—and in some cases are competitive with the teacher model📈(over 3x the student’s size).

October 23, 2025 at 2:39 PM

Cohere Labs

@cohereforai.bsky.social

📊 By inspecting the data itself, we see clear gains in quality along the targeted dimensions. Even when the interventions are relatively small, they produce substantial changes in completions improving their fluency, diversity, and difficulty ✨

October 23, 2025 at 2:39 PM

Cohere Labs

@cohereforai.bsky.social

⛰️With these simple transformations, we’re able to obtain consistent improvements across our 12 target languages and a diverse set of benchmarks, with particularly pronounced gains on open-ended tasks — our best proxies for real human use 💬

October 23, 2025 at 2:39 PM

Cohere Labs

@cohereforai.bsky.social

Only relying on translation often yields unnatural, Western-centric, and linguistically flat prompts.
💡We propose a simple, easy-to-implement solution to this problem:
🌐Transform translated prompts along three axes: Naturalization, Cultural Adaptation, and Difficulty.

October 23, 2025 at 2:39 PM

Cohere Labs

@cohereforai.bsky.social

🌍Most multilingual instruction data starts as English and translation can’t capture cultural nuance or linguistic richness
What if we optimized prompts instead of completions?
That’s the focus of our most recent work on prompt space optimization for multilingual synthetic data🗣️

October 23, 2025 at 2:39 PM

Cohere Labs

@cohereforai.bsky.social

Global AI deserves reproducible and transparent evaluation. 🌎 With Global MMLU Lite now part of @kaggle.com Benchmarks, you can track the multilingual performance of top models as well as test your own!

Check out the leaderboard and notebook linked below.

October 17, 2025 at 4:00 PM

Cohere Labs

@cohereforai.bsky.social

This month, we've been very excited to welcome
Joelle Pineau, @cohere.com's new Chief AI Officer.

We look forward to working together on frontier research - advancing the science of building models that are robust, capable, and impactful in the real world.

October 16, 2025 at 2:19 PM

Cohere Labs

@cohereforai.bsky.social

Today at COLM, Cohere Labs Sr Research Scientist, @juliakreutzer.bsky.social will be presenting at 2 workshops.

First, the Multilingual Data Quality Signals workshop, bringing together researchers across disciplines to discuss & present research on data quality signals in multilingual data.

October 10, 2025 at 11:30 AM

Cohere Labs

@cohereforai.bsky.social

Today at COLM, we are excited to share our work Déjà Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation, during Poster Session 4, 4:30 - 6:30pm.

Come connect with paper authors @juliakreutzer.bsky.social and @kocmitom.bsky.social.

October 8, 2025 at 11:30 AM

Cohere Labs

@cohereforai.bsky.social

How does FusioN use the same sample pool more effectively than BoN?

🧩While BoN picks just one sample per problem, FusioN synthesises one output from all samples – treating them as collaborators whose strengths can be integrated, not competitors in a zero-sum game.

October 2, 2025 at 10:00 AM

Cohere Labs

@cohereforai.bsky.social

Want the wisdom-of-the-crowd in 1 model?

🧑‍🎓🧑🏽‍🎓👨🏾‍🎓Fusion-of-N distills multiple teachers into richer synthetic data than BoN, training students that achieve bigger downstream gains, even surpassing teachers on multilingual factual reasoning 🌎

October 2, 2025 at 10:00 AM

Cohere Labs

@cohereforai.bsky.social

Test-time scaling doesn't need to waste samples, Fusion-of-N turns every sample into signal; outperforming BoN across tasks, languages and models. 🚀

Fusion-of-N boosts CommandA win-rates vs Gemini-2.5 Pro +8.3% across 11 languages – a +4.0% improvement over BoN 🥇

October 2, 2025 at 10:00 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news