https://jeremie-beucler.github.io/
Huge thanks to my great co-authors @zoepurcell.bsky.social , @luciecharlesneuro.bsky.social and @wimdeneys.bsky.social, and to my lab @lapsyde.bsky.social.
Stay tuned for the computational modeling part! 🤓
You can access the preprint here: osf.io/preprints/ps...
Huge thanks to my great co-authors @zoepurcell.bsky.social , @luciecharlesneuro.bsky.social and @wimdeneys.bsky.social, and to my lab @lapsyde.bsky.social.
Stay tuned for the computational modeling part! 🤓
You can access the preprint here: osf.io/preprints/ps...
To make this more practical, we release the 'baserater' R package. It allows you to access the database easily and to generate new items automatically using the LLM and prompt of your choice.
GitHub: jeremie-beucler.github.io/baserater (soon on CRAN!)
To make this more practical, we release the 'baserater' R package. It allows you to access the database easily and to generate new items automatically using the LLM and prompt of your choice.
GitHub: jeremie-beucler.github.io/baserater (soon on CRAN!)
We also re-analyzed existing base-rate stimuli from past research using our method. The results revealed a large, previously unnoticed variability in belief strength, which could be problematic in some cases.
We also re-analyzed existing base-rate stimuli from past research using our method. The results revealed a large, previously unnoticed variability in belief strength, which could be problematic in some cases.
This method allows us to create a massive database of over 100,000 base-rate items, each with an associated belief strength value.
Here is an example of every possible items for one single adjective out of 66 ("Arrogant")! Best to be a kindergarten teacher than a politician in this case. 🤭
This method allows us to create a massive database of over 100,000 base-rate items, each with an associated belief strength value.
Here is an example of every possible items for one single adjective out of 66 ("Arrogant")! Best to be a kindergarten teacher than a politician in this case. 🤭
And it works really well! LLM-generated ratings showed a very strong correlation with human judgments.
More importantly, our belief-strength measure robustly predicted participants' actual choices in a separate base-rate neglect experiment!
And it works really well! LLM-generated ratings showed a very strong correlation with human judgments.
More importantly, our belief-strength measure robustly predicted participants' actual choices in a separate base-rate neglect experiment!
We tested this idea on the classic lawyer–engineer base-rate neglect task, asking GPT-4 and LLaMA 3.3 to rate how strongly traits (like “kind”) are associated with groups (like “nurse”) using typicality ratings, a proxy for p(trait|group).
We tested this idea on the classic lawyer–engineer base-rate neglect task, asking GPT-4 and LLaMA 3.3 to rate how strongly traits (like “kind”) are associated with groups (like “nurse”) using typicality ratings, a proxy for p(trait|group).
Could LLMs help? 🤖
For once, having human-like biases is desirable! Because LLMs are trained on vast amounts of human text, they implicitly encode typical associations, and may be great at measuring belief strength!
Could LLMs help? 🤖
For once, having human-like biases is desirable! Because LLMs are trained on vast amounts of human text, they implicitly encode typical associations, and may be great at measuring belief strength!
We argue that measuring “belief strength” is a major bottleneck in reasoning research, which mostly relies on conflict vs. no-conflict items.
It requires costly human ratings and is rarely done parametrically, limiting the development of theoretical & computational models of biased reasoning.
We argue that measuring “belief strength” is a major bottleneck in reasoning research, which mostly relies on conflict vs. no-conflict items.
It requires costly human ratings and is rarely done parametrically, limiting the development of theoretical & computational models of biased reasoning.
Cognitive biases often involve a mental conflict between intuitive beliefs (“nurses are kind”) and logical or probabilistic information (995 vs 5). 🤯
But how strong is the pull of that belief?
Cognitive biases often involve a mental conflict between intuitive beliefs (“nurses are kind”) and logical or probabilistic information (995 vs 5). 🤯
But how strong is the pull of that belief?