Lightnews — Scholar-powered news

Kellin Pelrine

@kellinpelrine.bsky.social

📄 Read the paper: arxiv.org/abs/2411.05060
🤗 Hugging Face repo: huggingface.co/datasets/Com...
💻 Code & website: misinfo-datasets.complexdatalab.com

A Guide to Misinformation Detection Data and Evaluation

Misinformation is a complex societal issue, and mitigating solutions are difficult to create due to data deficiencies. To address this, we have curated the largest collection of (mis)information datas...

arxiv.org

June 19, 2025 at 2:24 PM

Kellin Pelrine

@kellinpelrine.bsky.social

👥 Research by CamilleThibault
@jacobtian.bsky.social @gskulski.bsky.social
TaylorCurtis JamesZhou FlorenceLaflamme LukeGuan
@reirab.bsky.social @godbout.bsky.social @kellinpelrine.bsky.social

June 19, 2025 at 2:23 PM

Kellin Pelrine

@kellinpelrine.bsky.social

🚀 Given these challenges, error analysis and other simple steps could greatly improve the robustness of research in the field. We propose a lightweight Evaluation Quality Assurance (EQA) framework to enable research results that translate more smoothly to real-world impact.

June 19, 2025 at 2:15 PM

Kellin Pelrine

@kellinpelrine.bsky.social

🛠️ We also provide practical tools:
• CDL-DQA: a toolkit to assess misinformation datasets
• CDL-MD: the largest misinformation dataset repo, now on Hugging Face 🤗

June 19, 2025 at 2:15 PM

Kellin Pelrine

@kellinpelrine.bsky.social

🔍 Categorical labels can underestimate the performance of generative systems by massive amounts: half the errors or more.

June 19, 2025 at 2:15 PM

Kellin Pelrine

@kellinpelrine.bsky.social

📊Severe spurious correlations and ambiguities affect the majority of datasets in the literature. For example, most datasets have many examples where one can’t conclusively assess veracity at all.

June 19, 2025 at 2:14 PM

Kellin Pelrine

@kellinpelrine.bsky.social

5/5 🔑 We frame structural safety generalization as a fundamental vulnerability and a tractable target for research on the road to robust AI alignment. Read the full paper: arxiv.org/pdf/2504.09712

arxiv.org

June 3, 2025 at 2:36 PM

Kellin Pelrine

@kellinpelrine.bsky.social

4/5 🛡️ Our fix: Structure Rewriting (SR) Guardrail. Rewrite any prompt into a canonical (plain English) form before evaluation. On GPT-4o, SR Guardrails cut attack success from 44% to 6% while blocking zero benign prompts.

June 3, 2025 at 2:36 PM

Kellin Pelrine

@kellinpelrine.bsky.social

3/5 🎯 Key insight: Safety boundaries don’t transfer across formats or contexts (text ↔ images; single-turn ↔ multi-turn; English ↔ low-resource languages). We define 4 criteria for tractable research: Semantic Equivalence, Explainability, Model Transferability, Goal Transferability.

June 3, 2025 at 2:36 PM

Kellin Pelrine

@kellinpelrine.bsky.social

2/5 🔍 Striking examples:
• Claude 3.5: 0% ASR on image jailbreaks—but split the same content across images? 25% success.
• Gemini 1.5 Flash: 3% ASR on text prompts—paste that text in an image and it soars to 72%.
• GPT-4o: 4% ASR on single perturbed images—split across multiple images → 38%.

June 3, 2025 at 2:36 PM

Kellin Pelrine

@kellinpelrine.bsky.social

5/5 👥Team: Maximilian Puelma Touzel, Sneheel Sarangi, Austin Welch, Gayatri Krishnakumar, Dan Zhao, Zachary Yang, Hao Yu, Ethan Kosak-Hine, Tom Gibbs, Andreea Musulan, Camille Thibault, Busra Tugce Gurbuz, Reihaneh Rabbany, Jean-François Godbout, @kellinpelrine.bsky.social

October 22, 2024 at 4:49 PM

Kellin Pelrine

@kellinpelrine.bsky.social

4/5 Stay tuned for updates as we expand the measurement suite, add stats for assessing counterfactuals, push scale further and refine the agent personas!
📄 Read the full paper: arxiv.org/abs/2410.13915
🖥️ Code: github.com/social-sandb...

A Simulation System Towards Solving Societal-Scale Manipulation

The rise of AI-driven manipulation poses significant risks to societal trust and democratic processes. Yet, studying these effects in real-world settings at scale is ethically and logistically impract...

arxiv.org

October 22, 2024 at 4:48 PM

Kellin Pelrine

@kellinpelrine.bsky.social

3/5 We demonstrate the system in a few scenarios involving an election with different types of agents structured with memories and traits. In one example, we align agents beliefs in order to flip the election relative to a control setting.

October 22, 2024 at 4:47 PM

Kellin Pelrine

@kellinpelrine.bsky.social

2/5 We built a sim system! Our 1st version has:
1.LLM-based agents interacting on social media (Mastodon).
2.Scalability: 100+ versatile, rich agents (memory, traits, etc.)
3.Measurement tools: dashboard to track agent voting, candidate favorability, and activity in an election.

October 22, 2024 at 4:46 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news