🌐 https://ryskina.github.io/
LMs are bad (too rational) at predicting human behaviour, but aligned with humans in assuming rationality in others’ choices.
arxiv.org/abs/2406.17055
LMs are bad (too rational) at predicting human behaviour, but aligned with humans in assuming rationality in others’ choices.
arxiv.org/abs/2406.17055
Training new token embeddings on examples with a specific property (e.g., short answers) leads to finding “machine-only synonyms” for these tokens that elicit the same behaviour (short answers=’lack’).
arxiv.org/abs/2510.08506
Training new token embeddings on examples with a specific property (e.g., short answers) leads to finding “machine-only synonyms” for these tokens that elicit the same behaviour (short answers=’lack’).
arxiv.org/abs/2510.08506
VLMs are worse than vision-only models on vision-only tasks – LMs are biased and underutilize their (easily accessible) visual representations!
hidden-plain-sight.github.io
VLMs are worse than vision-only models on vision-only tasks – LMs are biased and underutilize their (easily accessible) visual representations!
hidden-plain-sight.github.io
Linguistic olympiad problems about certain linguistic features (e.g., morphological ones) are tougher for LMs, but morphological pre-tokenization helps!
arxiv.org/abs/2508.11260
Linguistic olympiad problems about certain linguistic features (e.g., morphological ones) are tougher for LMs, but morphological pre-tokenization helps!
arxiv.org/abs/2508.11260
LMs outperform the experts they are trained on through skill denoising (averaging out experts’ errors), skill selection (relying on the most appropriate expert), and skill generalization (combining experts’ knowledge).
arxiv.org/abs/2508.17669
LMs outperform the experts they are trained on through skill denoising (averaging out experts’ errors), skill selection (relying on the most appropriate expert), and skill generalization (combining experts’ knowledge).
arxiv.org/abs/2508.17669
LMs use sensory language (olfactory, auditory, …) differently from people + evidence that RLHF may discourage sensory language.
arxiv.org/abs/2504.06393
LMs use sensory language (olfactory, auditory, …) differently from people + evidence that RLHF may discourage sensory language.
arxiv.org/abs/2504.06393
Developmentally plausible LM training works not because of simpler language but because of lower n-gram diversity! Warning against anthropomorphizing / equating learning in LMs and in children.
openreview.net/pdf?id=AFMGb...
Developmentally plausible LM training works not because of simpler language but because of lower n-gram diversity! Warning against anthropomorphizing / equating learning in LMs and in children.
openreview.net/pdf?id=AFMGb...