https://francois-meyer.github.io/
Cape Town, South Africa
(1) Initially, subwords change rapidly.
(2) Next, learning trajectories undergo a sudden shift (around 30% in the plot below).
(3) After a while, subword boundaries stabilise.
(4) In finetuning, subwords change again to suit downstream tasks.
(1) Initially, subwords change rapidly.
(2) Next, learning trajectories undergo a sudden shift (around 30% in the plot below).
(3) After a while, subword boundaries stabilise.
(4) In finetuning, subwords change again to suit downstream tasks.
arxiv.org/pdf/2511.09197
arxiv.org/pdf/2511.09197