Crowdworkers often
- have limited attention
- rely on heuristics like “it’s the thought that counts”
- focusing on intentions rather than actual wording
show systematic rating inflation due to social desirability bias
Crowdworkers often
- have limited attention
- rely on heuristics like “it’s the thought that counts”
- focusing on intentions rather than actual wording
show systematic rating inflation due to social desirability bias
Here’s how expert agreement (Krippendorff's alpha) varied across empathy sub-components:
Here’s how expert agreement (Krippendorff's alpha) varied across empathy sub-components:
And specifically looked at 21 sub-components of empathic communication from 4 evaluative frameworks
The result? LLMs consistently matched expert judgments better than crowdworkers did! 🔥
And specifically looked at 21 sub-components of empathic communication from 4 evaluative frameworks
The result? LLMs consistently matched expert judgments better than crowdworkers did! 🔥
Excited to share the first paper from my postdoc (!!) investigating when LLMs are reliable judges - with empathic communication as a case study 🧐
🧵👇
Excited to share the first paper from my postdoc (!!) investigating when LLMs are reliable judges - with empathic communication as a case study 🧐
🧵👇