Kenneth Enevoldsen
kennethenevoldsen.bsky.social
Kenneth Enevoldsen
@kennethenevoldsen.bsky.social
Postdoc at Aarhus University working on developing and evaluating representations of language and more

Maintain and develop: MTEB, ScandEval, tomsup, DaCy, etc.

#NLPProc
Love to see it!
March 25, 2025 at 10:27 AM
New postdoc position at AarhusNLP, come join us!

The research includes efficient post-training, alignment, evaluation, and preference optimization, but we are very flexible for reinterpretation. So, if you think that you might be a partial fit do apply!

international.au.dk/about/profil...
Postdoctoral Positions in NLP Post-Training for Cultural Alignment and Preference Optimization - Vacancy at Aarhus University
Vacancy at School of Culture and Society - Center for Humanities Computing Aarhus, Aarhus University
international.au.dk
March 12, 2025 at 12:46 PM
Last week at #NoDaLiDa, we presented our work on 🇪🇺EuroEval, a large-scale benchmark for evaluating decoders and encoders.

The framework consists of 9 (🇬🇧🇫🇷🇩🇪🇳🇱🇸🇪🇩🇰🇳🇴🇮🇸🇫🇴) languages, with more to come, where each includes a language understanding and generation benchmark
March 11, 2025 at 10:07 AM
I am delighted to announce that we have released 🎊 MMTEB 🎊, a large-scale collaboration working on efficient multilingual evaluation of embedding models.

This work implements >500 evaluation tasks across >1000 languages and covers a wide range of use cases and domains🩺👩‍💻⚖️
February 20, 2025 at 9:56 AM
Multilingual MTEB is soon to be released and with it, a shining new benchmark with a zero-shot filter! However, zero-shot is quite hard to define in a time of derivative models and synthetic data.

If you have an opinion on how it should define zero-shot, let us know:
github.com/embeddings-b...
Defining zero-shot for MTEB · Issue #1760 · embeddings-benchmark/mteb
The next version of the MTEB leaderboard will soon be released and with it a new zero-shot filter. However, we are currently planning to use the following definition of zero-shot. This issue is to ...
github.com
January 11, 2025 at 12:22 PM
New favourite razor
January 11, 2025 at 11:43 AM
Reposted by Kenneth Enevoldsen
📣 Vacancy for Assistant Professor of Cognitive Science at Department of Linguistics, Cognitive Science and Semiotics, Aarhus University, Denmark. (Deadline January 6)

international.au.dk/about/profil...
Assistant Professor of Cognitive Science at the School of Communication and Culture - Vacancy at Aarhus University
Vacancy at School of Communication and Culture - Linguistics, Cognitive Science and Semiotics, Dept. of, Aarhus University
international.au.dk
December 24, 2024 at 9:37 PM
Does anyone know of good methods for tracking dataset contaminations that doesn't rely on generation? Anything would be greatly appreciated

Asking as we would like to track and detect dataset contamination in MTEB:
github.com/embeddings-b...
track and detect dataset contamination · Issue #1636 · embeddings-benchmark/mteb
In multiple threads tracking dataset contamination have been mentioned as multiple models do not share their training dataset. This issue is intended to link these discussions together as well as p...
github.com
December 27, 2024 at 6:59 PM
Reposted by Kenneth Enevoldsen
Ugens afsnit af Verbos podcast er live med @kennethenevoldsen.bsky.social og Thomas Kobber Panum til en snak om AI's udvikling med udgangspunkt i Ilya Sutskevers talk til NeurIPS 🔥 #dkai #dktech

Lyt her 👇

YouTube: youtu.be/IpEla8mZHnU?...

Spotify: open.spotify.com/episode/41WT...
December 20, 2024 at 7:45 AM
December 16, 2024 at 8:05 PM
Any good reason for using (machine) translated datasets/benchmarks for evaluating language models?
December 6, 2024 at 10:43 PM
On my way to Neurips to present our work on the Scandinavian Embedding Benchmark (SEB) for evaluating embedding for retrieval, classification etc. for mainland Scandinavian languages.

Is there anything I should see while I am there?
December 5, 2024 at 11:43 AM