Lightnews — Scholar-powered news

Bastian Bunzeck

@bbunzeck.bsky.social

350 followers 810 following 100 posts

Computational linguist trying to understand how humans and computers learn and use language 👶🧠🗣️🖥️💬

The work is mysterious and important. See https://bbunzeck.github.io

PhDing at @clausebielefeld.bsky.social

Posts Replies Media Videos

Reposted by Bastian Bunzeck

Elen Le Foll 🇫🇷 🇬🇧 🇩🇪

@elenlefoll.fediscience.org.ap.brid.gy

We began day 2 of our Large Language Models (LLM) for linguistics research workshop @UniKoeln with a fascinating keynote by Charlotte Pouw on "Interpreting models for speech generation and understanding using methods from #psycholinguistics". Charlotte shared […]

[Original post on fediscience.org]

Charlotte presenting a slide with plots entitled The Role of Data Exposure

November 25, 2025 at 8:54 AM

Bastian Bunzeck

@bbunzeck.bsky.social

Many thanks to this awesome team of collaborators: @frap98.bsky.social, @manarali.bsky.social, Omar Momen, @arianna-bis.bsky.social, @hbuschme.bsky.social, and Sina Zarrieß (@clausebielefeld.bsky.social). 😇🚀

October 28, 2025 at 12:58 PM

Bastian Bunzeck

@bbunzeck.bsky.social

Francesca will also present our poster in the BabyLM poster session at EMNLP in Suzhou/China, so do not forget to stop by!

October 28, 2025 at 12:56 PM

Bastian Bunzeck

@bbunzeck.bsky.social

If you are hungry for more info now, please check out the preprint here: arxiv.org/abs/2510.20358, and find our models and data here: huggingface.co/collections/....

Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)

We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine...

arxiv.org

October 28, 2025 at 12:56 PM

Bastian Bunzeck

@bbunzeck.bsky.social

Also, the dialogue pairs taken from real data provide a much better reward signal than those synthetically generated. Here, real data beats synthetic data quite drastically!

October 28, 2025 at 12:56 PM

Bastian Bunzeck

@bbunzeck.bsky.social

While it is not overly surprising that a model performs well on a task that aligns with its pretraining goal, it is still interesting to see how a non-dialogue model underperforms on this task.

October 28, 2025 at 12:56 PM

Bastian Bunzeck

@bbunzeck.bsky.social

While performance on most benchmarks also decreases, it actually increases on our own dialogue minimal pairs (real vs. randomly sampled adjacency pairs), from 64% for the pretrained model to 68% after reinforcement learning, even outperforming the BabyLM baseline by 10%.

October 28, 2025 at 12:55 PM

Bastian Bunzeck

@bbunzeck.bsky.social

We created two DPO datasets, one with recorded adjacency pairs (good) vs, randomly sampled ones (bad), and generated adjacency pairs (good) vs. randomly sampled ones (bad).

October 28, 2025 at 12:55 PM

Bastian Bunzeck

@bbunzeck.bsky.social

...ii) a direct quality reward from a teacher model, and iii) a reward based on the log probabilities of a teacher model (and its dialogue continuations). While these rewards did not improve our models performance, two different DPO approaches did!

October 28, 2025 at 12:55 PM

Bastian Bunzeck

@bbunzeck.bsky.social

Then, as our contribution to the interaction track, we experimented with different reinforcement learning strategies. For PPO, we experimented with i) a reward signal based on dialogue continuations from a teacher model (based on BLEU or embedding similarity)...

October 28, 2025 at 12:55 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news