Kenneth Enevoldsen
kennethenevoldsen.bsky.social
Kenneth Enevoldsen
@kennethenevoldsen.bsky.social
Postdoc at Aarhus University working on developing and evaluating representations of language and more

Maintain and develop: MTEB, ScandEval, tomsup, DaCy, etc.

#NLPProc
I especially like our dashboard, which allows the comparison of models of interest across target languages.
March 11, 2025 at 10:10 AM
It notably includes both high, mid, and highly low-resource languages, which allow examining generalization even in areas where the available training data is minuscule in comparison to English:
March 11, 2025 at 10:09 AM
Last week at #NoDaLiDa, we presented our work on 🇪🇺EuroEval, a large-scale benchmark for evaluating decoders and encoders.

The framework consists of 9 (🇬🇧🇫🇷🇩🇪🇳🇱🇸🇪🇩🇰🇳🇴🇮🇸🇫🇴) languages, with more to come, where each includes a language understanding and generation benchmark
March 11, 2025 at 10:07 AM
One of the features that I really like is the ability to compare specific models of interest across target languages. Here, we show an example of Dutch, English, and German, but you can try out any combination:

euroeval.com/extras/radia...
March 11, 2025 at 10:01 AM
This notably includes low-resource languages such as Faroese and Icelandic, which are great for checking generalizations to languages in which the available data is minuscule
March 11, 2025 at 10:00 AM
This work resulted from a large-scale collaboration, and I would like to thank all of the authors and contributors on MTEB.
February 20, 2025 at 10:00 AM
This new release also comes with a whole new leaderboard, where it is possible to build benchmarks tailored to your use case using in-depth task selection.
February 20, 2025 at 9:59 AM
Such an extensive collection of tasks comes with a considerable computational cost. Thus, we have added multiple optimizations to ensure the benchmark is accessible and quick to run. We see notable speedups for the English benchmark while maintaining relative rank.
February 20, 2025 at 9:58 AM
Examining this claim, we see that the Mistral-derived models indeed perform better in languages on which the models are believed to be trained:
February 20, 2025 at 9:58 AM
We use this collection of tasks to propose multiple benchmarks for multilingual, code, European and Indic languages, and many more.

We find that smaller multilingual models (~500M) outperform notably larger 7B models, likely due to a limited multilingual pre-training.
February 20, 2025 at 9:57 AM
I am delighted to announce that we have released 🎊 MMTEB 🎊, a large-scale collaboration working on efficient multilingual evaluation of embedding models.

This work implements >500 evaluation tasks across >1000 languages and covers a wide range of use cases and domains🩺👩‍💻⚖️
February 20, 2025 at 9:56 AM
New favourite razor
January 11, 2025 at 11:43 AM