Digital Security Research Collaboratory Neu-Ulm
banner
dsrc.bsky.social
Digital Security Research Collaboratory Neu-Ulm
@dsrc.bsky.social
Research collaboratory at HNU, Germany, studying digital tech abuse to create safer online spaces. Views are our own, not the university’s.
We find that an AI system labels nearly half of a detoxified dataset as ‘clean’ while it still contains implicit bias. We continue to improve the detoxification algorithm using SAFE-TD. 7/7
November 20, 2025 at 12:20 PM
SAFE-TD considers four possible outcomes: Success, Failure, Content Distortion, and Implicit Hate Transformation, where hate hides behind polite language. 6/7
November 20, 2025 at 12:20 PM
To address this, we created SAFE-TD, a multi-agent LLM framework that simulates the perspectives of the sender and multiple targets to judge whether a rewritten post is truly safe. 5/7
November 20, 2025 at 12:20 PM
But how do we know that a post was detoxified? Asking humans to evaluate this is costly and can be inconsistent. Automated evalutions tend to miss subtle biases. A post may sound neutral but still carry harmful undertones or coded hate. 4/7
November 20, 2025 at 12:20 PM
We work on "text detoxification" with LLMs: We rewrite hateful posts into safer versions instead of removing them, providing a content-moderation approach that preserves the core message while removing hateful elements. 3/7
November 20, 2025 at 12:20 PM
Hate speech remains one of social media’s biggest challenges. Deleting or banning content helps, but it can also silence users and push hate elsewhere. We work on innovative ways to reduce harm without erasing voices. 2/7
November 20, 2025 at 12:20 PM
Beyond hate speech, MAXplain enables rapid prototyping for other multimodal tasks — from misinformation detection to broader classification and evaluation tasks.

A step toward transparent, human-centered, and collaborative AI. 6/6
November 4, 2025 at 12:22 PM
On the Hateful Memes benchmark, the multi-agent setup outperformed single-model baselines — showing higher F1 scores and how structured collaboration improves transparency and control. 5/6
November 4, 2025 at 12:22 PM
The web interface (see diagram 👇) visualizes agent conversations, lets users create new agents, control workflows, and configure a shared rulebook.

A Chrome plugin enables one-click inspection of online posts. 4/6
November 4, 2025 at 12:22 PM
It splits the task into expert agents — e.g. a TargetingAgent for group detection, a CriticAgent to challenge outputs, and a central JudgeAgent that integrates all perspectives — explainable by design. 3/6
November 4, 2025 at 12:22 PM
Social media platforms face an overload of hateful memes and other multimodal content — combining text and images that challenge automated detection. MAXplain coordinates several specialized AI agents instead of one black box. 2/6
November 4, 2025 at 12:22 PM
If effective, attitude inoculation could complement moderation — scalable, preventive, and psychologically grounded. A step toward prebunking social harms before they spread. 6/6
October 23, 2025 at 9:58 AM
The proposed 3×2 experiment compares:
- Basic rule reminders
- Netiquette education
- Future-oriented inoculation
… each in generic vs. specific forms. 5/6
October 23, 2025 at 9:58 AM
The study tests future-oriented inoculation by encouraging users to think long-term, reducing susceptibility to authoritarian and anti-democratic attitudes. 4/6
October 23, 2025 at 9:58 AM
Enter attitude inoculation: expose users to weak versions of manipulative content + refutations. Like a cognitive vaccine, it builds mental immunity against harmful ideas. 3/6
October 23, 2025 at 9:58 AM
Social media moderation removes millions of posts — but harmful content persist. Platforms can’t catch them all, and deletion alone doesn’t change minds. 2/6
October 23, 2025 at 9:58 AM
If effective, attitude inoculation could complement moderation — scalable, preventive, and psychologically grounded. A step toward prebunking social harms before they spread. 6/6
October 23, 2025 at 9:40 AM
The proposed 3×2 experiment compares:
- Basic rule reminders
- Netiquette education
- Future-oriented inoculation
… each in generic vs. specific forms. 5/6
October 23, 2025 at 9:40 AM
The study tests future-oriented inoculation by encouraging users to think long-term, reducing susceptibility to authoritarian and anti-democratic attitudes. 4/6
October 23, 2025 at 9:40 AM
Enter attitude inoculation: expose users to weak versions of manipulative content + refutations. Like a cognitive vaccine, it builds mental immunity against harmful ideas. 3/6
October 23, 2025 at 9:40 AM
Social media moderation removes millions of posts — but harmful content persist. Platforms can’t catch them all, and deletion alone doesn’t change minds. 2/6
October 23, 2025 at 9:40 AM
The full paper provides a unified framework grounded in homophily, distinguishing the two constructs and mapping their interdependence.

It also sets out a research agenda for future interdisciplinary work on measurement, mechanisms, and interventions. 6/6
October 14, 2025 at 7:45 AM