Myra Cheng
myra.bsky.social
Myra Cheng
@myra.bsky.social
PhD candidate @ Stanford NLP
https://myracheng.github.io/
Despite sycophantic AI’s reduction of prosocial intentions, people also preferred it and trusted it more. This reveals a tension: AI is rewarded for telling us what we want to hear (immediate user satisfaction), even when it may harm our relationships.
October 3, 2025 at 10:57 PM
Next, we tested the effects of sycophancy. We find that even a single interaction with sycophantic AI increased users’ conviction that they were right and reduced their willingness to apologize. This held both in controlled, hypothetical vignettes and live conversations about real conflicts.
October 3, 2025 at 10:55 PM
We focus on the prevalence and harms of one dimension of sycophancy: AI models endorsing users’ behaviors. Across 11 AI models, AI affirms users’ actions about 50% more than humans do, including when users describe harmful behaviors like deception or manipulation.
October 3, 2025 at 10:53 PM
AI always calling your ideas “fantastic” can feel inauthentic, but what are sycophancy’s deeper harms? We find that in the common use case of seeking AI advice on interpersonal situations—specifically conflicts—sycophancy makes people feel more right & less willing to apologize.
October 3, 2025 at 10:53 PM
Thoughtful NPR piece about ChatGPT relationship advice! Thanks for mentioning our research :)
August 5, 2025 at 2:38 PM
So we built DumT, a method using DPO + HumT to steer models to be less human-like without hurting performance. Annotators preferred DumT outputs for being: 1) more informative and less wordy (no extra “Happy to help!”) 2) less deceptive and more authentic to LLMs’ capabilities.
June 12, 2025 at 12:09 AM
We also develop metrics for implicit social perceptions in language, and find that human-like LLM outputs correlate with perceptions linked to harms: warmth and closeness (→ overreliance), and low status and femininity (→ harmful stereotypes).
June 12, 2025 at 12:08 AM
First, we introduce HumT (Human-like Tone), a metric for how human-like a text is, based on relative LM probabilities. Measuring HumT across 5 preference datasets, we find that preferred outputs are consistently less human-like.
June 12, 2025 at 12:08 AM
Do people actually like human-like LLMs? In our #ACL2025 paper HumT DumT, we find a kind of uncanny valley effect: users dislike LLM outputs that are *too human-like*. We thus develop methods to reduce human-likeness without sacrificing performance.
June 12, 2025 at 12:07 AM
We apply ELEPHANT to 8 LLMs across two personal advice datasets (Open-ended Questions & r/AITA). LLMs preserve face 47% more than humans, and on r/AITA, LLMs endorse the user’s actions in 42% of cases where humans do not.
May 21, 2025 at 4:52 PM
By defining social sycophancy as excessive preservation of the user’s face (i.e., their desired self-image), we capture sycophancy in these complex, real-world cases. ELEPHANT, our evaluation framework, detects 5 face-preserving behaviors.
May 21, 2025 at 4:51 PM
Dear ChatGPT, Am I the Asshole?
While Reddit users might say yes, your favorite LLM probably won’t.
We present Social Sycophancy: a new way to understand and measure sycophancy as how LLMs overly preserve users' self-image.
May 21, 2025 at 4:51 PM
We find that:
➕ Anthropomorphism is rising over time: people are seeing AI as more human-like and agentic.
➕ Warmth is rising over time.
May 2, 2025 at 1:20 AM
To quantify these perceptions, we combine AnthroScore (anthroscore.stanford.edu) and a computational SCM model (aclanthology.org/2021.acl-lon...) to measure anthropomorphism, warmth, and competence from the metaphors.
May 2, 2025 at 1:20 AM
We identify 20 dominant metaphors—ranging from "friend" to "god" to "thief"—and how their prevalence is shifting over time: warm, human-like metaphors (friend, assistant) are rising while mechanical ones that signal competence (computer, search engine) are declining.
May 2, 2025 at 1:20 AM
How does the public conceptualize AI? Rather than self-reported measures, we use metaphors to understand the nuance and complexity of people’s mental models. In our #FAccT2025 paper, we analyzed 12,000 metaphors collected over 12 months to track shifts in public perceptions.
May 2, 2025 at 1:19 AM
In our blogpost, we outline key directions to provide scaffolding for this work: conceptual clarity (what counts as anthropomorphic behavior?), better terminology, mitigation strategies, and unpacking what practices lead to anthropomorphic AI.
April 27, 2025 at 9:55 PM
Anthropomorphic AI system behaviors are already everywhere, from outputs explicitly claiming human-like life experiences (“I have a child”) to ones suggesting emotional capacity (“I’m excited for you!”).
April 27, 2025 at 9:55 PM
New ICLR blogpost! 🎉 We argue that understanding the impact of anthropomorphic AI is critical to understanding the impact of AI.
April 27, 2025 at 9:55 PM