Jindra Helcl
@jindrahelcl.bsky.social
41 followers 57 following 8 posts
Posts Media Videos Starter Packs
Does your model know the difference between koprovka and kulajda? 🍽️ Does it recognize famous Ukrainians from their statues? 🗽 And what if you ask in Slovak? 😱 Check out our new regional QA dataset and find out!! 🤯

Now available on Hugging Face huggingface.co/datasets/ufa...
🧵 We're releasing CUS-QA - a new benchmark for testing LLMs on regional knowledge!
Find out what your model knows about Czechia 🇨🇿, Slovakia 🇸🇰, and Ukraine 🇺🇦!
👉 Textual and visual questions, answers, and human judgment on model outputs!
huggingface.co/datasets/ufa...
www.arxiv.org/abs/2507.22752
ufal/cus-qa · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
We need to have poster fights at the end of every conference.
Reposted by Jindra Helcl
📢I am hiring a Postdoc to work on post-training methods for low-resource languages. Apply by August 15 employment.ku.dk/faculty/?sho....
Let's talk at #ACL2025NLP in Vienna if you want to know more about the position and life in Denmark.
Postdoc in Natural Language Processing
employment.ku.dk
From what I gathered on the web, numpy seems way faster, though..
Reposted by Jindra Helcl
📢 First release: 38 monolingual reference LLMs (2.15B params) via #HPLT + #OpenEuroLLM

⚙️Trained on 100B tokens from HPLT v2 dataset
🌍 Cover EU langs + others
⚙️ Based on LLaMA, trained on #LUMI
📈 Useful for evaluation

Downloads + more info at openeurollm.eu/blog/hplt-oe...
Reposted by Jindra Helcl
this "class 9" is such a cool idea for an LLM course!

(from ufal.mff.cuni.cz/courses/npfl... via @zdenekkasner.bsky.social )
Petition against renaming the Českomoravská Metro station, sign and share! (Czech ID needed)
gov.cz/e-petice/118...
Am I the only one to think that these should always be aligned with the direction of travel? (Especially if you already have more than one version of these and the trains never turn.)
Reposted by Jindra Helcl
I'm part of this! There's also a paper: arxiv.org/abs/2503.10267
** New parallel data set ** . We've just released HPLT v2.0, a parallel data set of 50 languages paired with English, 380M sentence pairs in total. Extracted from the Internet Archive and Common Crawl hplt-project.org/datasets/v2.0
HPLT - High Performance Language Technologies
A space that combines petabytes of natural language data with large-scale model training
hplt-project.org
Come to MT Marathon! Always a great fun and this year's marathon in Helsinki is not going to be an exception! See everyone there! 💥❤️
Come to Helsinki for the 18th MT Marathon! Sponsored by EAMT @ufal-cuni.bsky.social
Reposted by Jindra Helcl
Come to Helsinki for the 18th MT Marathon! Sponsored by EAMT @ufal-cuni.bsky.social
Reposted by Jindra Helcl
Kick-off successfully completed. Go OpenEuroLLM team!
openeurollm.eu
The goal of OpenEuroLLM is to build an open, Multilingual, European, Generative, Foundational LLM
In Prague they told us.. :)