Lightnews — Scholar-powered news

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

We find that an AI system labels nearly half of a detoxified dataset as ‘clean’ while it still contains implicit bias. We continue to improve the detoxification algorithm using SAFE-TD. 7/7

November 20, 2025 at 12:20 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

SAFE-TD considers four possible outcomes: Success, Failure, Content Distortion, and Implicit Hate Transformation, where hate hides behind polite language. 6/7

November 20, 2025 at 12:20 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

To address this, we created SAFE-TD, a multi-agent LLM framework that simulates the perspectives of the sender and multiple targets to judge whether a rewritten post is truly safe. 5/7

November 20, 2025 at 12:20 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

But how do we know that a post was detoxified? Asking humans to evaluate this is costly and can be inconsistent. Automated evalutions tend to miss subtle biases. A post may sound neutral but still carry harmful undertones or coded hate. 4/7

November 20, 2025 at 12:20 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

We work on "text detoxification" with LLMs: We rewrite hateful posts into safer versions instead of removing them, providing a content-moderation approach that preserves the core message while removing hateful elements. 3/7

November 20, 2025 at 12:20 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

Hate speech remains one of social media’s biggest challenges. Deleting or banning content helps, but it can also silence users and push hate elsewhere. We work on innovative ways to reduce harm without erasing voices. 2/7

November 20, 2025 at 12:20 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

Beyond hate speech, MAXplain enables rapid prototyping for other multimodal tasks — from misinformation detection to broader classification and evaluation tasks.

A step toward transparent, human-centered, and collaborative AI. 6/6

November 4, 2025 at 12:22 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

On the Hateful Memes benchmark, the multi-agent setup outperformed single-model baselines — showing higher F1 scores and how structured collaboration improves transparency and control. 5/6

November 4, 2025 at 12:22 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

The web interface (see diagram 👇) visualizes agent conversations, lets users create new agents, control workflows, and configure a shared rulebook.

A Chrome plugin enables one-click inspection of online posts. 4/6

November 4, 2025 at 12:22 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

It splits the task into expert agents — e.g. a TargetingAgent for group detection, a CriticAgent to challenge outputs, and a central JudgeAgent that integrates all perspectives — explainable by design. 3/6

November 4, 2025 at 12:22 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

Social media platforms face an overload of hateful memes and other multimodal content — combining text and images that challenge automated detection. MAXplain coordinates several specialized AI agents instead of one black box. 2/6

November 4, 2025 at 12:22 PM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

If effective, attitude inoculation could complement moderation — scalable, preventive, and psychologically grounded. A step toward prebunking social harms before they spread. 6/6

October 23, 2025 at 9:58 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

The proposed 3×2 experiment compares:
- Basic rule reminders
- Netiquette education
- Future-oriented inoculation
… each in generic vs. specific forms. 5/6

October 23, 2025 at 9:58 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

The study tests future-oriented inoculation by encouraging users to think long-term, reducing susceptibility to authoritarian and anti-democratic attitudes. 4/6

October 23, 2025 at 9:58 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

Enter attitude inoculation: expose users to weak versions of manipulative content + refutations. Like a cognitive vaccine, it builds mental immunity against harmful ideas. 3/6

October 23, 2025 at 9:58 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

Social media moderation removes millions of posts — but harmful content persist. Platforms can’t catch them all, and deletion alone doesn’t change minds. 2/6

October 23, 2025 at 9:58 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

If effective, attitude inoculation could complement moderation — scalable, preventive, and psychologically grounded. A step toward prebunking social harms before they spread. 6/6

October 23, 2025 at 9:40 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

The proposed 3×2 experiment compares:
- Basic rule reminders
- Netiquette education
- Future-oriented inoculation
… each in generic vs. specific forms. 5/6

October 23, 2025 at 9:40 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

The study tests future-oriented inoculation by encouraging users to think long-term, reducing susceptibility to authoritarian and anti-democratic attitudes. 4/6

October 23, 2025 at 9:40 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

Enter attitude inoculation: expose users to weak versions of manipulative content + refutations. Like a cognitive vaccine, it builds mental immunity against harmful ideas. 3/6

October 23, 2025 at 9:40 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

Social media moderation removes millions of posts — but harmful content persist. Platforms can’t catch them all, and deletion alone doesn’t change minds. 2/6

October 23, 2025 at 9:40 AM

Digital Security Research Collaboratory Neu-Ulm

@dsrc.bsky.social

The full paper provides a unified framework grounded in homophily, distinguishing the two constructs and mapping their interdependence.

It also sets out a research agenda for future interdisciplinary work on measurement, mechanisms, and interventions. 6/6

October 14, 2025 at 7:45 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news