Pierre Orhan
banner
pierreorhan.bsky.social
Pierre Orhan
@pierreorhan.bsky.social
Neuroscience: learning dynamics in artificial and real neural networks.
ENS Paris.
Do not hesitate to contact Pablo Diego or me if you have any questions about the work!
This work was done while I was at
@lsp-ens.bsky.social and finished at @institutducerveau.bsky.social !
Stay tuned for future works applying this approach to neurological recordings!
February 6, 2026 at 1:52 PM
As a conclusion, artificial neural networks mirror some developmental steps and clarify sufficient conditions of their emergence. Yet models learn way too slowly: they require two orders of magnitude more words (tokens) than what children use to discover these linguistic structures!
February 6, 2026 at 1:52 PM
The paper has some additional findings:
- Simpler linguistic structures (phonemes) are learned first, but semantic classes are not learned in a particular order.
- We invented a Topological Probe (enforcing topology rather than distance) to show the robustness of the approach.
February 6, 2026 at 1:52 PM
Although the instantiation of the WordNet graph was strong in large models, it was small but significant in audio and small text models. We confirmed it with measurements based on semantic classes and visualized it through a nice coloring of the WordNet Graph, here for a 1B Pythia (text) model.
February 6, 2026 at 1:52 PM
We found instantiation of these structures in an audio model: Wav2vec2 pretrained on English, but not when pretrained on other sounds!
By exploring these instantiations during pretraining, we evidence the emergence of each linguistic structure: phonemes emerge before lexical semantics and syntax.
February 6, 2026 at 1:52 PM
To detect the instantiation of linguistic structures, we adapt the structural probe of Hewitt and Manning to semantics and phonetics. Semantic is given by the WordNet graph (hypernym relationships "is contained in"). Example: mammal is a hypernym of placental, itself a hypernym of carnivor, equine…
February 6, 2026 at 1:52 PM
Findings:
- Linguistic structures are instantiated by a speech ANN.
- Phonemes are instantiated before semantics and syntax, mirroring the acquisition stages of children.
Methodological development:
- Probing of speech models through checkpoints
- Novel topological probe of WordNet
February 6, 2026 at 1:52 PM