Dirk Hovy
dirkhovy.bsky.social
Dirk Hovy
@dirkhovy.bsky.social
Professor @milanlp.bsky.social for #NLProc, compsocsci, #ML
Also at http://dirkhovy.com/
Reposted by Dirk Hovy
Trying an experiment in good old-fashioned blogging about papers: dallascard.github.io/granular-mat...
Language Model Hacking - Granular Material
dallascard.github.io
November 16, 2025 at 7:51 PM
Reposted by Dirk Hovy
#TBT #NLProc ' Attanasio et al. study asks 'Is It Worth the (Environmental) Cost?' analyzing continuous training for language models. Balances benefits, environmental impacts, for responsible use. #Sustainability'
arxiv.org
November 20, 2025 at 4:02 PM
Reposted by Dirk Hovy
#MemoryModay #NLProc ' 'State of Profanity Obfuscation in NLP Scientific Publications' probes bias in non-English papers. @deboranozza.bsky.social & @dirkhovy.bsky.social (2023) propose 'PrOf' to aid authors & improve access.
The State of Profanity Obfuscation in Natural Language Processing Scientific Publications
Debora Nozza, Dirk Hovy. Findings of the Association for Computational Linguistics: ACL 2023. 2023.
aclanthology.org
November 17, 2025 at 4:04 PM
Reposted by Dirk Hovy
#TBT #NLProc Explore 'Wisdom of Instruction-Tuned LLM Crowds' by Plaza et al. LLM labels outperform single models in tasks & languages. But few-shot can't top zero-shot. Supervised models rule.
Wisdom of Instruction-Tuned Language Model Crowds. Exploring Model Label Variation
Flor Miriam Plaza-del-Arco, Debora Nozza, Dirk Hovy. Proceedings of the 3rd Workshop on Perspectivist Approaches to NLP (NLPerspectives) @ LREC-COLING 2024. 2024.
aclanthology.org
October 30, 2025 at 4:05 PM
Reposted by Dirk Hovy
#MemoryModay #NLProc 'Universal Joy: A Data Set and Results for Classifying Emotions Across Languages' by Lamprinidis et al. (2021) explores how AI research affects our planet.
Universal Joy A Data Set and Results for Classifying Emotions Across Languages
Sotiris Lamprinidis, Federico Bianchi, Daniel Hardt, Dirk Hovy. Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2021.
aclanthology.org
November 3, 2025 at 4:02 PM
Reposted by Dirk Hovy
#TBT #NLProc "Explaining Speech Classification Models" by Pastor et al. (2024) makes speech classification more transparent! 🔍 Their research reveals which words matter most and how tone and background noise impact decisions.
Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features
Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long...
aclanthology.org
November 6, 2025 at 4:04 PM
Reposted by Dirk Hovy
#MemoryModay #NLProc 'Measuring Harmful Representations in Scandinavian Language Models' uncovers gender bias, challenging Scandinavia's equity image.
Measuring Harmful Representations in Scandinavian Language Models
Samia Touileb, Debora Nozza. Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS). 2022.
aclanthology.org
November 10, 2025 at 4:03 PM
Reposted by Dirk Hovy
#TBT #NLProc Hessenthaler et al.'s 2022 work delves into AI's link with fairness & energy reduction in English NLP models, challenging bias reduction theories. #AI #sustainability
Bridging Fairness and Environmental Sustainability in Natural Language Processing
Marius Hessenthaler, Emma Strubell, Dirk Hovy, Anne Lauscher. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.
aclanthology.org
November 13, 2025 at 4:05 PM
Reposted by Dirk Hovy
🎉 Congratulations to all #EMNLP2025 award winners 🎉

Starting with the ✨Best Paper award ✨:

"Infini-gram mini: Exact n-gram Search at the Internet Scale with FM-Index"
by Hao Xu, Jiacheng Liu, Yejin Choi, Noah A. Smith, and Hannaneh Hajishirzi
aclanthology.org/2025.emnlp-m...

1/n
November 7, 2025 at 10:29 PM
Reposted by Dirk Hovy
Last week at @nlperspectives.bsky.social I presented work showing that annotators only provide the same label on ~75% of items across four NLP labelling tasks following a two week gap
November 11, 2025 at 4:44 PM
Reposted by Dirk Hovy
You missed one: G. Abercrombie, T. Dinkar, A. Cercas Curry, V. Rieser & @dirkhovy.bsky.social Consistency is Key: Disentangling label variation in NLP with Intra-Annotator Agreement. @nlperspectives.bsky.social
November 3, 2025 at 2:34 AM
Excited to head to Suzhou for the 30th edition of #EMNLP2025! 🎉 Had the great honor to serve as general chair this year. Looking forward to catching up with everyone and seeing some amazing #NLP research! 🤓📚
November 2, 2025 at 5:54 AM
Reposted by Dirk Hovy
🗓️ Nov 5 – Main Conference Posters
Personalization up to a Point
🧠 In the context of content moderation, we show that fully personalized models can perpetuate hate speech, and propose a policy-based method to impose legal boundaries.
📍 Hall C | 11:00–12:30
October 31, 2025 at 2:05 PM
Reposted by Dirk Hovy
🗓️ Nov 5 – Main Conference Posters
📘 Biased Tales
A dataset of 5k short LLM bedtime stories generated across sociocultural axes with an evaluation taxonomy for character-centric attributes and context-centric attributes.
📍 Hall C | 11:00–12:30
October 31, 2025 at 2:05 PM
Reposted by Dirk Hovy
🗓️ Nov 5 - Demo
Co-DETECT: Collaborative Discovery of Edge Cases in Text Classification
🧩 Co-DETECT – an iterative, human-LLM collaboration framework for surfacing edge cases and refining annotation codebooks in text classification.
📍 Demo Session 2 – Hall C3 | 14:30–16:00
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 6 – Findings Posters
The “r” in “woman” stands for rights.
💬 We propose a taxonomy of social dynamics in implicit misogyny (EN,IT), auditing 9 LLMs — and they consistently fail. The more social knowledge a message requires, the worse they perform.
📍 Hall C | 12:30–13:30
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 7 – Main Conference Posters
Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance
🧍 Discussing different applications for LLM persona prompting, and how to measure their success.
📍 Hall C | 10:30–12:00
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 7 – Main Conference Posters
TrojanStego: Your Language Model Can Secretly Be a Steganographic Privacy-Leaking Agent
🔒 LLMs can be fine-tuned to leak secrets via token-based steganography!
📍 Hall C | 10:30–12:00
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 8 – WiNLP Workshops
No for Some, Yes for Others
🤖 We investigate how sociodemographic persona prompts affect false refusal behaviors in LLMs. Model and task type are the dominant factors driving these refusals.
October 31, 2025 at 2:06 PM
Reposted by Dirk Hovy
🗓️ Nov 8 – NLPerspectives Workshops
Balancing Quality and Variation
🧮 For datasets to represent diverse opinions, they must preserve variation while filtering out spam. We evaluate annotator filtering heuristics and show how they often remove genuine variation.
October 31, 2025 at 2:07 PM
Reposted by Dirk Hovy
🗓️ Nov 8 – BabyLM Workshop
Teacher Demonstrations in a BabyLM's Zone of Proximal Development for Contingent Multi-Turn Interaction
👶 ContingentChat, a Teacher–Student framework that benchmarks and improves multi-turn contingency in a BabyLM trained on 100M words.
October 31, 2025 at 2:07 PM
Reposted by Dirk Hovy
🗓️ Nov 8 – STARSEM Workshop
Generalizability of Media Frames: Corpus Creation and Analysis Across Countries
📰 We investigate how well media frames generalize across different media landscapes. The 15 MFC frames remain broadly applicable, with minor revisions of the guidelines.
October 31, 2025 at 2:07 PM
Reposted by Dirk Hovy
🗓️ Nov 6 – Oral Presentation (TACL)
IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance
⚖️ A foundation for measuring LLM political bias in realistic user conversations.
📍 A303 | 10:30–12:00
October 31, 2025 at 2:07 PM