Lightnews — Scholar-powered news

Jérémie Beucler

@jeremiebeucler.bsky.social

160 followers 510 following 25 posts

PhD student with Wim de Neys & Lucie Charles at LaPsyDE; MSc in Cog Sciences at ENS - interested in reasoning & metacognition

https://jeremie-beucler.github.io/

Posts Replies Media Videos

Jérémie Beucler

@jeremiebeucler.bsky.social

Brilliant! Congrats Tanay 🙌

November 4, 2025 at 12:15 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

#PsychScience #CognitiveBias #ReasoningResearch #LargeLanguageModels

October 16, 2025 at 4:18 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

10/10

Huge thanks to my great co-authors @zoepurcell.bsky.social , @luciecharlesneuro.bsky.social and @wimdeneys.bsky.social, and to my lab @lapsyde.bsky.social.

Stay tuned for the computational modeling part! 🤓

You can access the preprint here: osf.io/preprints/ps...

OSF

osf.io

October 16, 2025 at 4:17 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

9/10

To make this more practical, we release the 'baserater' R package. It allows you to access the database easily and to generate new items automatically using the LLM and prompt of your choice.

GitHub: jeremie-beucler.github.io/baserater (soon on CRAN!)

GitHub - Jeremie-Beucler/baserater

Contribute to Jeremie-Beucler/baserater development by creating an account on GitHub.

github.com

October 16, 2025 at 4:17 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

8/10

We also re-analyzed existing base-rate stimuli from past research using our method. The results revealed a large, previously unnoticed variability in belief strength, which could be problematic in some cases.

Histogram showing the distribution of stereotype strength in existing items, spanning a wide range of stereotype strength values from a log ratio of around 0 to a log ration > 2.

October 16, 2025 at 4:17 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

7/10

This method allows us to create a massive database of over 100,000 base-rate items, each with an associated belief strength value.

Here is an example of every possible items for one single adjective out of 66 ("Arrogant")! Best to be a kindergarten teacher than a politician in this case. 🤭

Matrix showing all the possible items created for the adjective "arrogant" for all the possible groups in our study. Upper part shows stereotype strength, and lower part shows the predicted choice probability of one group based on our fitted model.

October 16, 2025 at 4:17 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

6/10

And it works really well! LLM-generated ratings showed a very strong correlation with human judgments.

More importantly, our belief-strength measure robustly predicted participants' actual choices in a separate base-rate neglect experiment!

The left panel of the figure shows the positive correlation between LLM and average human typicality ratings for the two LLMs; the right panel shows how choices are consistently predicted by our stereotype strength measure.

October 16, 2025 at 4:17 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

5/10

We tested this idea on the classic lawyer–engineer base-rate neglect task, asking GPT-4 and LLaMA 3.3 to rate how strongly traits (like “kind”) are associated with groups (like “nurse”) using typicality ratings, a proxy for p(trait|group).

October 16, 2025 at 4:17 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

4/10

Could LLMs help? 🤖

For once, having human-like biases is desirable! Because LLMs are trained on vast amounts of human text, they implicitly encode typical associations, and may be great at measuring belief strength!

October 16, 2025 at 4:17 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

3/10

We argue that measuring “belief strength” is a major bottleneck in reasoning research, which mostly relies on conflict vs. no-conflict items.

It requires costly human ratings and is rarely done parametrically, limiting the development of theoretical & computational models of biased reasoning.

Illustration of different hypothetical response functions (linear, sigmoid, and step-like) linking belief strength to choice probability when only two belief strength levels (low and high, indicated by black diamonds) are used. This limited binary approach restricts researchers’ ability to precisely characterize participants’ underlying cognitive processes or strategies and differentiate among competing theoretical models.

October 16, 2025 at 4:17 PM

Jérémie Beucler

@jeremiebeucler.bsky.social

2/10

Cognitive biases often involve a mental conflict between intuitive beliefs (“nurses are kind”) and logical or probabilistic information (995 vs 5). 🤯

But how strong is the pull of that belief?

October 16, 2025 at 4:17 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news