Sathvik
@sathvik.bsky.social
computational psycholinguistics @ umd
he/him
he/him
Lastly, we replicated the aggregate analysis separately on words that were and weren’t split by the morphological analyzer, and found that the predictive power of surprisal for reading time predictions was worse for words split by the BPE tokenizer than words that were not.
November 2, 2023 at 10:28 PM
Lastly, we replicated the aggregate analysis separately on words that were and weren’t split by the morphological analyzer, and found that the predictive power of surprisal for reading time predictions was worse for words split by the BPE tokenizer than words that were not.
Looking more closely, surprisal of words appears to scale incrementally with more morphemes, consistent with the cognitive prediction: each sub-unit adds more work. But surprisal of words with more BPE tokens does not.
November 2, 2023 at 10:26 PM
Looking more closely, surprisal of words appears to scale incrementally with more morphemes, consistent with the cognitive prediction: each sub-unit adds more work. But surprisal of words with more BPE tokens does not.
We then evaluated the models on eyetracking & self-paced reading data, finding in the aggregate that BPE is equally predictive of human behavior. So far so good for BPE.
However, this isn’t the whole story. Around 95% of the tokens in both evaluation corpora were never split by the BPE tokenizer.
However, this isn’t the whole story. Around 95% of the tokens in both evaluation corpora were never split by the BPE tokenizer.
November 2, 2023 at 10:25 PM
We then evaluated the models on eyetracking & self-paced reading data, finding in the aggregate that BPE is equally predictive of human behavior. So far so good for BPE.
However, this isn’t the whole story. Around 95% of the tokens in both evaluation corpora were never split by the BPE tokenizer.
However, this isn’t the whole story. Around 95% of the tokens in both evaluation corpora were never split by the BPE tokenizer.
Honored my paper was accepted to Findings of #EMNLP2023! Many psycholinguistics studies use LLMs to estimate the probability of words in context. But LLMs process statistically derived subword tokens, while human processing doesn't. Does this matter? (w/Philip Resnik) 🧵
arxiv.org/abs/2310.17774
arxiv.org/abs/2310.17774
November 2, 2023 at 10:20 PM
Honored my paper was accepted to Findings of #EMNLP2023! Many psycholinguistics studies use LLMs to estimate the probability of words in context. But LLMs process statistically derived subword tokens, while human processing doesn't. Does this matter? (w/Philip Resnik) 🧵
arxiv.org/abs/2310.17774
arxiv.org/abs/2310.17774