Francois Meyer
banner
francois-meyer.bsky.social
Francois Meyer
@francois-meyer.bsky.social
PhD student at the University of Cape Town, working on text generation for low-resource, morphologically complex languages.
https://francois-meyer.github.io/

Cape Town, South Africa
We see 4 stages of subword learning.
(1) Initially, subwords change rapidly.
(2) Next, learning trajectories undergo a sudden shift (around 30% in the plot below).
(3) After a while, subword boundaries stabilise.
(4) In finetuning, subwords change again to suit downstream tasks.
November 19, 2025 at 9:55 AM
We study subword learning for 3 morphologically diverse languages: isiXhosa is agglutinative, Setswana is disjunctive (morphemes space-separated), and English as a typological middle ground. Learning dynamics vary across languages, with agglutinative isiXhosa being most unstable.
November 19, 2025 at 9:55 AM
If a language model could dynamically optimise subword tokenisation, how would its subwords evolve during training? In our new paper we study the learning dynamics of subword segmentation:
arxiv.org/pdf/2511.09197
November 19, 2025 at 9:55 AM