Workshop on Multilingual Data Quality Signals
banner
wmdqs.bsky.social
Workshop on Multilingual Data Quality Signals
@wmdqs.bsky.social
The first iteration of our workshop will be co-located with @colmweb.org 2025 in Montreal.
https://wmdqs.org/
If you were able to join us, let us know about your experience: docs.google.com/forms/d/e/1F...
October 10, 2025 at 8:52 PM
Thank you everyone for coming to WMDQS (pronounced "whim ducks")!
October 10, 2025 at 8:51 PM
Then we had our second poster session for our paper submissions. The full papers are available on our website!
October 10, 2025 at 8:49 PM
After lunch, @sebnagel.bsky.social gave a keynote about the data collected by @commoncrawl.bsky.social!
October 10, 2025 at 8:46 PM
David Adelani gave a keynote about text quality for low-resource languages.
October 10, 2025 at 4:18 PM
We had our first poster session, hearing from some of our shared task participants!
October 10, 2025 at 4:18 PM
We presented the results of our shared task! We received annotations for over 30,000 document representing over 60 languages. We also showed the results of our LangID dataset and system shared task tracks. Thank you everyone who participated!
October 10, 2025 at 4:18 PM
We started with a keynote from @juliakreutzer.bsky.social about multilingual fine-tuning data!
October 10, 2025 at 4:18 PM
See our updated website for more details: wmdqs.org
1st Workshop on Multilingual Data Quality Signals (WMDQS)
A workshop addressing multilingual data quality. Held on the 10th October 2025 in Montréal.
wmdqs.org
October 9, 2025 at 8:17 PM
We will also have a session on our shared task, which was about improving language identification models. Participants of the shared task contributed annotations to create a new LangID dataset and also submitted new LangID systems.
October 9, 2025 at 8:17 PM
Our third and final keynote will be from @sebnagel.bsky.social about the data in Common Crawl.
October 9, 2025 at 8:17 PM
Our second keynote will be by David Adelani about text quality for low-resource languages.
October 9, 2025 at 8:17 PM
Our first keynote will be from @juliakreutzer.bsky.social about data for multilingual fine-tuning.
October 9, 2025 at 8:17 PM
Contribute annotations here: dynabench.org/tasks/text-l...
Dynabench
Dynabench
dynabench.org
July 21, 2025 at 6:07 PM
July 21, 2025 at 6:07 PM