Manuel Tonneau
manueltonneau.bsky.social
Manuel Tonneau
@manueltonneau.bsky.social
PhD candidate @oiioxford.bsky.social NLP, Computational Social Science @WorldBank

manueltonneau.com
For languages with moderators, we normalize mod counts by content volume per language and find that platforms allocate moderation workforce disproportionately relative to content volume, with languages primarily spoken in the Global South (Spanish, Portuguese, Arabic) consistently underserved.
August 28, 2025 at 8:46 AM
We also quantify the amount of EU-based users whose national language does not have moderators, and we’re talking about millions of users posting in languages with zero moderators.
August 28, 2025 at 8:46 AM
Taking Twitter/X as an example, we then show that languages subject to moderation blind spots are generally widely spoken on social media, representing an average of 31% of all tweets during a one-day period in countries where they are the official language.
August 28, 2025 at 8:46 AM
We first look at language coverage and find that while larger platforms such as YouTube and Meta have moderators in most EU languages, smaller platforms such as X and Snapchat have several language blind spots with no human moderators, particularly in Southern, Eastern and Northern Europe.
August 28, 2025 at 8:46 AM
Social media platforms operate globally, but do they allocate human moderation equitably across languages?

Our new WP shows the answer is no:

-Millions of users post in languages with zero moderators
-Where mods exist, mod count relative to content volume varies widely across langs

osf.io/amfws
August 28, 2025 at 8:46 AM
🏆 Thrilled to share that our HateDay paper has received an Outstanding Paper Award at #ACL2025

Big thanks to my wonderful co-authors: @deeliu97.bsky.social, Niyati, @computermacgyver.bsky.social, Sam, Victor, and @paul-rottger.bsky.social!

Thread 👇and data avail at huggingface.co/datasets/man...
July 31, 2025 at 8:05 AM
What about moderation? Given low perf, automatic moderation is not desirable. We investigate the feasibility of human-in-the-loop moderation where models flag and humans verify. Moderating >80% of all hate would require humans to review >10% of all daily tweets which can get 💸💸 for large communities
November 26, 2024 at 12:53 PM
Why is perf so low? An important reason is it is hard to distinguish between offensive and hateful content (as exposed by @thomasdavidson.bsky.social in seminal work) and offensive content is much more prevalent than hate in the wild, crowding out hate in the predicted positives
November 26, 2024 at 12:53 PM
We then evaluate popular hate speech detection LLMs on HateDay and compare with their performance on academic hate speech datasets and functional tests (HateCheck). We find that traditional eval methods systematically overestimate performance on representative data, which is low.
November 26, 2024 at 12:53 PM
We first look at the prevalence and composition of hate in HateDay and find that most types of hate are represented across contexts, with some local specificities in the importance of each hate type (e.g., green-bashing in German tweets, islamophobia in India).
November 26, 2024 at 12:53 PM
Can we detect #hatespeech at scale on social media?

To answer this, we introduce 🤬HateDay🗓️, a global hate speech dataset representative of a day on Twitter.

The answer: not really! Detection perf is low and overestimated by traditional eval methods

arxiv.org/abs/2411.15462
🧵
November 26, 2024 at 12:53 PM
We also look at the alignment between data and annotator 🌍 origin, crucial to avoid cross-cultural annotation errors. While both origins are reported and align partially 🟨 or totally 🟩 in most cases for Arabic and Spanish, 0 ‼️ of the surveyed English datasets report both 🟥.
May 23, 2024 at 1:17 PM
While English data reflects Twitter's geographic skew, overrepresentation for Arabic (🇯🇴 🇸🇦) and Spanish (🇨🇱 🇪🇸) is mainly due to intentional decisions of authors to focus on a specific geo-cultural context, acknowledging the cultural sensitivity of 🤬 (e.g., 🇨🇱 in Arango et al.)
May 23, 2024 at 1:16 PM
🤬 data overrepresents a handful of countries relative to both the social media population and the general population. For English, Global North (🇺🇸🇬🇧🇦🇺🇨🇦) ➕ and Global South (🇮🇳🇳🇬🇵🇰)➖, with most datasets not grounded in 🌍, overlooking the diversity of English speakers online.
May 23, 2024 at 1:16 PM
In this work, we first survey 🤬 datasets in 8 languages, confirming an English-language bias and Twitter's dominance as a data source. However, we find that the share of English has recently 📉, with languages like Arabic catching up.
May 23, 2024 at 1:15 PM
What is deemed #hatespeech can vary across cultures, but 🤬 datasets are often built only at the language level 🗣️, masking potential cultural biases. Our new WOAH paper reveals large cultural representation gaps in 🤬 datasets using 🗣️ and 🌍 as cultural proxies.

🧵
May 23, 2024 at 1:15 PM