Bastian Bunzeck
banner
bbunzeck.bsky.social
Bastian Bunzeck
@bbunzeck.bsky.social
Computational linguist trying to understand how humans and computers learn and use language 👶🧠🗣️🖥️💬

The work is mysterious and important. See https://bbunzeck.github.io

PhDing at @clausebielefeld.bsky.social
Reposted by Bastian Bunzeck
We began day 2 of our Large Language Models (LLM) for linguistics research workshop @UniKoeln with a fascinating keynote by Charlotte Pouw on "Interpreting models for speech generation and understanding using methods from #psycholinguistics". Charlotte shared […]

[Original post on fediscience.org]
November 25, 2025 at 8:54 AM
Many thanks to this awesome team of collaborators: @frap98.bsky.social, @manarali.bsky.social, Omar Momen, @arianna-bis.bsky.social, @hbuschme.bsky.social, and Sina Zarrieß (@clausebielefeld.bsky.social). 😇🚀
October 28, 2025 at 12:58 PM
Francesca will also present our poster in the BabyLM poster session at EMNLP in Suzhou/China, so do not forget to stop by!
October 28, 2025 at 12:56 PM
If you are hungry for more info now, please check out the preprint here: arxiv.org/abs/2510.20358, and find our models and data here: huggingface.co/collections/....
Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)
We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine...
arxiv.org
October 28, 2025 at 12:56 PM
Also, the dialogue pairs taken from real data provide a much better reward signal than those synthetically generated. Here, real data beats synthetic data quite drastically!
October 28, 2025 at 12:56 PM
While it is not overly surprising that a model performs well on a task that aligns with its pretraining goal, it is still interesting to see how a non-dialogue model underperforms on this task.
October 28, 2025 at 12:56 PM
While performance on most benchmarks also decreases, it actually increases on our own dialogue minimal pairs (real vs. randomly sampled adjacency pairs), from 64% for the pretrained model to 68% after reinforcement learning, even outperforming the BabyLM baseline by 10%.
October 28, 2025 at 12:55 PM
We created two DPO datasets, one with recorded adjacency pairs (good) vs, randomly sampled ones (bad), and generated adjacency pairs (good) vs. randomly sampled ones (bad).
October 28, 2025 at 12:55 PM
...ii) a direct quality reward from a teacher model, and iii) a reward based on the log probabilities of a teacher model (and its dialogue continuations). While these rewards did not improve our models performance, two different DPO approaches did!
October 28, 2025 at 12:55 PM
Then, as our contribution to the interaction track, we experimented with different reinforcement learning strategies. For PPO, we experimented with i) a reward signal based on dialogue continuations from a teacher model (based on BLEU or embedding similarity)...
October 28, 2025 at 12:55 PM