Ilker Kesen
@ilkerkesen.bsky.social
48 followers 120 following 17 posts
Postdoctoral Scientist at University of Copenhagen. I am currently more focused on developing pixel language models. #nlproc #multilinguality #multimodality
Posts Media Videos Starter Packs
ilkerkesen.bsky.social
We used to develop Knet, but after the rise of HuggingFace, we stopped using it. Before HF, it was also painful to convert each model to Julia code and array, where the models were already available in PyTorch/Tensorflow. Though, I guess you can find BERT/GPT implementations in Knet, somewhere.
ilkerkesen.bsky.social
For more details about 📏Cetvel, please check our preprint.

📜Paper: arxiv.org/abs/2508.16431
💻Code: github.com/KUIS-AI/cetvel
📊Leaderboard: huggingface.co/spaces/KUIS-...
ilkerkesen.bsky.social
Furthermore, we assessed the informativeness of each task using Gini coefficients. We found that grammatical error correction, machine translation and extractive QA (about Turkish / Islam history) are the most informative tasks for evaluating LLMs in Turkish within 📏Cetvel.
ilkerkesen.bsky.social
and lastly (iii) Turkish-centric 8B parameter model Cere-Llama-3-8B outperforms even 70B parameter model Llama-3.3-70B on some Turkish-centric tasks such as grammatical error correction.
ilkerkesen.bsky.social
We tested widely used 33 open-weight LLMs covering different modely families up to 70B parameters. We find that (i) LLMs tailored for Turkish underperform compared against general-purpose LLMs, (ii) Llama 3 models dominate other LLMs within the same parameter scale, [...]
ilkerkesen.bsky.social
Second, 📏Cetvel also offers NLP tasks linguistically and culturally grounded in Turkish, such as proverb understanding, circumflex-based word sense disambugiation, and extractive QA centered on Turkish and Islam history.
ilkerkesen.bsky.social
First, 📏Cetvel goes beyond multiple-choice QA in contrast to existing Turkish benchmarks. It spans 23 tasks across 7 categories, including grammatical error correction, machine translation, summarization, and extractive QA.
ilkerkesen.bsky.social
So, why another Turkish benchmark? The answer is that existing benchmarks often fall short either in limited task diversity or lack of content culturally relevant to Turkish. Unlike existing benchmarks, 📏Cetvel addresses both shortcomings adequately.
ilkerkesen.bsky.social
📢New preprint: We introduce 📏Cetvel, a unified benchmark for evaluating language understanding, generation, and cultural capacity of LLMs in Turkish🇹🇷 #AI #LLM #NLProc

Joint work with Abrek Er, @gozdegulsahin.bsky.social, @aykuterdem.bsky.social from KUIS AI Center.
ilkerkesen.bsky.social
Excited to share that our paper "Multilingual Pretraining for Pixel Language Models" has been accepted to the #EMNLP2025 main conference! Please see the thread below and the paper itself for more details.
ilkerkesen.bsky.social
Announcing our recent work “Multilingual Pretraining for Pixel Language Models”! We introduce PIXEL-M4, a pixel language model pretrained on four visually & linguistically diverse scripts: English, Hindi, Ukrainian & Simplified Chinese. #NLProc
ilkerkesen.bsky.social
Data-efficiency analysis on the Indic NER benchmark also demonstrated that PIXEL-M4 excels at cross-lingual transfer learning in low-resource settings.
ilkerkesen.bsky.social
Investigations on learned multilingual hidden representations reveal a strong semantic alignment between pretraining languages in the later layers, particularly for English-Ukrainian and English-Hindi pairs.
ilkerkesen.bsky.social
Word-level probing analyses illustrate that PIXEL-M4 captures better linguistic features even on languages and writing systems not seen during pretraining.
ilkerkesen.bsky.social
Downstream experiments on text classification, dependency parsing and named entity tasks recognition show that PIXEL-M4 outperforms its English-only-pretrained counterpart PIXEL-BIGRAMS on almost all non-Latin script languages.
ilkerkesen.bsky.social
Announcing our recent work “Multilingual Pretraining for Pixel Language Models”! We introduce PIXEL-M4, a pixel language model pretrained on four visually & linguistically diverse scripts: English, Hindi, Ukrainian & Simplified Chinese. #NLProc
Reposted by Ilker Kesen
israsalazar.bsky.social
Today we are releasing Kaleidoscope 🎉

A comprehensive multimodal & multilingual benchmark for VLMs! It contains real questions from exams in different languages.

🌍 20,911 questions and 18 languages
📚 14 subjects (STEM → Humanities)
📸 55% multimodal questions
ilkerkesen.bsky.social
The LLM feedbacks that I received were like paraphrased versions of my reviews, just trying to convince me about being *more explicit* even though (I think) I was already sufficiently explicit. I did not change anything and the authors were able to understand the points that I raised.