Sam Boeve
banner
boevesam.bsky.social
Sam Boeve
@boevesam.bsky.social
Doctoral Researcher | Cognitive Science | Computational Psycholinguistics | FWO fellow | Bogaertslab | Ghent Univeristy
Want to explore word predictability yourself on a sample of each corpus used in this work, check out this app:

wordpredictabilityvisualized.vercel.app
Word Predictability Visualization App
wordpredictabilityvisualized.vercel.app
September 2, 2025 at 7:27 AM
Modelling reading times in Dutch?:

gpt2-small-dutch (huggingface.co/GroNLP/gpt2-...) or gpt2-medium-dutch-embeddings (huggingface.co/GroNLP/gpt2-...) are great options.
GroNLP/gpt2-small-dutch · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
September 2, 2025 at 7:27 AM
3. Predictability effects are also logarithmic in Dutch, corroborating effects found in English (= linear effect of surprisal):

For very unpredictable words, a decrease in predictability has a much larger slowing-down effect on reading times than the same decrease for highly predictable words.
September 2, 2025 at 7:27 AM
2. Language-specific models are generally better than multilingual ones (multilingual models are shown in blue in the figure below).
September 2, 2025 at 7:27 AM
Key findings 📝

1. Smaller Dutch models often predict reading times better (= inverse scaling trend) ~ in line with evidence of English models.

But, with more context (in a book reading corpus), larger models catch up.
September 2, 2025 at 7:27 AM
Large language models are powerful tools for psycholinguistic research.

But, most evidence so far is limited to English.

How well do Dutch open-source language models fit reading times using their word predictability estimates?
September 2, 2025 at 7:27 AM
Overall, our results provide a psychometric leaderboard of Dutch large language models, ideal for researchers interested in effects of predictability in Dutch.

Check out our full dataset and code here:
osf.io/wr4qf/
A Systematic Evaluation of Dutch Large Language Models’ Surprisal Estimates in Sentence, Paragraph, and Book Reading
A psychometric evaluation of Dutch large language models. Hosted on the Open Science Framework
osf.io
December 19, 2024 at 4:12 PM
Finally, we found a linear link between surprisal and reading times except for the GECO corpus where a non-linear link between surprisal and reading times fitted the data best.

A challenge to the notion of an universal linear effect of surprisal.
December 19, 2024 at 4:12 PM
Second, smaller Dutch models showed a better fit to reading times than the largest models, replicating the inverse scaling trend seen in English.
However, this effect varied depending on the corpus used.
December 19, 2024 at 4:12 PM
First, across three eye-tracking corpora, we found that in each case, a Dutch LLMs' surprisal estimates outperformed the multilingual model (mGPT) and the N-gram model in predicting reading times.
December 19, 2024 at 4:12 PM
3.

Does surprisal still show linear link with reading times when estimated with a Dutch-specific language model as opposed to a multilingual model?
December 19, 2024 at 4:12 PM
2.

Do these Dutch-specific LLMs show a similar inverse scaling trend as English models?

That is, do the smaller transformer models' surprisal estimates account better for reading times than those of the very large models?
December 19, 2024 at 4:12 PM
1.

What is the best computational method for estimating word predictability in Dutch?

We compare 14 Dutch large language models (LLMs), a multilingual model (mGPT) and an N-gram model in their ability of explaining reading times.
December 19, 2024 at 4:12 PM
The effect of word predictability on reading times is well established for English but not so much for Dutch.

We adressed this and asked three questions:
December 19, 2024 at 4:12 PM