Lightnews — Scholar-powered news

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

180 followers 330 following 38 posts

Postdoc at Aarhus University working on developing and evaluating representations of language and more

Maintain and develop: MTEB, ScandEval, tomsup, DaCy, etc.

#NLPProc

Posts Replies Media Videos

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

I especially like our dashboard, which allows the comparison of models of interest across target languages.

March 11, 2025 at 10:10 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

It notably includes both high, mid, and highly low-resource languages, which allow examining generalization even in areas where the available training data is minuscule in comparison to English:

March 11, 2025 at 10:09 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

Last week at #NoDaLiDa, we presented our work on 🇪🇺EuroEval, a large-scale benchmark for evaluating decoders and encoders.  The framework consists of 9 (🇬🇧🇫🇷🇩🇪🇳🇱🇸🇪🇩🇰🇳🇴🇮🇸🇫🇴) languages, with more to come, where each includes a language understanding and generation benchmark

March 11, 2025 at 10:07 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

One of the features that I really like is the ability to compare specific models of interest across target languages. Here, we show an example of Dutch, English, and German, but you can try out any combination:

euroeval.com/extras/radia...

March 11, 2025 at 10:01 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

This notably includes low-resource languages such as Faroese and Icelandic, which are great for checking generalizations to languages in which the available data is minuscule

March 11, 2025 at 10:00 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

This work resulted from a large-scale collaboration, and I would like to thank all of the authors and contributors on MTEB.

February 20, 2025 at 10:00 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

This new release also comes with a whole new leaderboard, where it is possible to build benchmarks tailored to your use case using in-depth task selection.

February 20, 2025 at 9:59 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

Such an extensive collection of tasks comes with a considerable computational cost. Thus, we have added multiple optimizations to ensure the benchmark is accessible and quick to run. We see notable speedups for the English benchmark while maintaining relative rank.

February 20, 2025 at 9:58 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

Examining this claim, we see that the Mistral-derived models indeed perform better in languages on which the models are believed to be trained:

February 20, 2025 at 9:58 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

We use this collection of tasks to propose multiple benchmarks for multilingual, code, European and Indic languages, and many more.

We find that smaller multilingual models (~500M) outperform notably larger 7B models, likely due to a limited multilingual pre-training.

February 20, 2025 at 9:57 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

I am delighted to announce that we have released 🎊 MMTEB 🎊, a large-scale collaboration working on efficient multilingual evaluation of embedding models.

This work implements >500 evaluation tasks across >1000 languages and covers a wide range of use cases and domains🩺👩‍💻⚖️

February 20, 2025 at 9:56 AM

Kenneth Enevoldsen

@kennethenevoldsen.bsky.social

New favourite razor

January 11, 2025 at 11:43 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news