Lightnews — Scholar-powered news

David Jurgens

@davidjurgens.bsky.social

See our #EMNLP2025 paper at aclanthology.org/2025.emnlp-m... for full details!

Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework

Mohna Chakraborty, Lu Wang, David Jurgens. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

aclanthology.org

November 6, 2025 at 5:47 AM

David Jurgens

@davidjurgens.bsky.social

This type of structured moral reasoning is not prompt engineering, it’s cognitive alignment and mirrors how we can inject instructions into prompts to guide how models solve problems. Our work is a step toward models that can explain not only what they decide, but why.

November 6, 2025 at 5:47 AM

David Jurgens

@davidjurgens.bsky.social

The takeaway: Moral alignment isn’t just about outputs—it’s about how we ask models to reason. By scaffolding ethical deliberation with value systems and reasoning structure, we can make model judgments more transparent, context-aware, and norm-sensitive.

November 6, 2025 at 5:47 AM

David Jurgens

@davidjurgens.bsky.social

Can smaller models learn this moral competence? Yes! Through reasoning-based distillation, we show that smaller LLMs inherit structured moral justifications from larger “teacher” models achieving better moral interpretability without extra inference cost.

November 6, 2025 at 5:47 AM

David Jurgens

@davidjurgens.bsky.social

3. Best combination: Schwartz Values + Care Ethics, yielding coherent, norm-aligned judgments.
4. First-Principles Reasoning gives the largest accuracy and consistency gains.

November 6, 2025 at 5:47 AM

David Jurgens

@davidjurgens.bsky.social

Zero-shot Findings
1. Structured reasoning ≫ label-only prediction for aligning with human decisions
2. Prompt design matters more than model size. Even small models reason better when given explicit moral structure.

November 6, 2025 at 5:47 AM

David Jurgens

@davidjurgens.bsky.social

We tested 12 open-source LLMs across 4 moral-reasoning datasets, introducing a taxonomy of structured prompts that embed:
1️⃣ Value systems (e.g., Schwartz, Moral Foundations)
2️⃣ Ethical theories (e.g., Care Ethics, Utilitarianism)
3️⃣ Cognitive reasoning strategies (e.g., First-Principles)

November 6, 2025 at 5:47 AM

David Jurgens

@davidjurgens.bsky.social

We ask how we can inject explicit moral scaffolding to make models deliberate more like us.

November 6, 2025 at 5:47 AM

David Jurgens

@davidjurgens.bsky.social

Humans reason about moral decisions through value trade-offs, ethical principles, and contextual sensitivity. However, most LLM moral responses rely on pretraining data or posttraining preferences, which can result in generic, biased, and normatively inconsistent decisions.

November 6, 2025 at 5:47 AM

David Jurgens

@davidjurgens.bsky.social

If you're at #EMNLP2025, come by our poster 10:30am on Friday where Junghwan Kim will be presenting (but not on Bluesky yet). This is Haotian Zhang's first paper too (he's applying for PhD programs now). You can read more at aclanthology.org/2025.emnlp-m...

Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and Domains

Junghwan Kim, Haotian Zhang, David Jurgens. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

aclanthology.org

November 6, 2025 at 5:42 AM

David Jurgens

@davidjurgens.bsky.social

Our trained models offer state of the art performance for authorship attribution in most languages, including English, and are available on Huggingface huggingface.co/Blablablab

Blablablab (Blablablab)

Natural Language Processing and Computational Social Science

huggingface.co

November 6, 2025 at 5:42 AM

David Jurgens

@davidjurgens.bsky.social

Multilingual training also helps cross-lingual generalization 🌏 A single model trained jointly across languages performs better even on unseen languages—we test on 13 heldout—and unseen domains. This transferability result shows that some elements of authorial style are common across languages!

November 6, 2025 at 5:42 AM

David Jurgens

@davidjurgens.bsky.social

We train on 4.5M authors across 36 languages and 13 domains—spanning 19 language families and 17 scripts. Our multilingual model beats monolingual baselines in 21 of 22 languages, improving Recall@8 by an average of +4.85%. Kazakh and Georgian gain up to +15.9%! 📈

November 6, 2025 at 5:42 AM

David Jurgens

@davidjurgens.bsky.social

For both, no language-specific tools are required, letting them easily scale and make constrastive learning benefit from hard positives with few easy negatives.

November 6, 2025 at 5:42 AM

David Jurgens

@davidjurgens.bsky.social

💡 Two core innovations:
1️⃣ Probabilistic Content Masking (PCM) – masks topic-heavy words, forcing the model to focus on style, not subject matter.
2️⃣ Language-Aware Batching (LAB) – groups same-language samples to avoid weak cross-lingual negatives during contrastive learning.

November 6, 2025 at 5:42 AM

David Jurgens

@davidjurgens.bsky.social

We ask:
➡️ Can multilingual training improve authorship attribution models for low-resource languages?
➡️ Can we build style embeddings that generalize across languages and domains—even unseen ones?

Spoiler: Yes. By a lot.

November 6, 2025 at 5:42 AM

David Jurgens

@davidjurgens.bsky.social

Authorship representation (AR) models learn to capture an author’s style—not their content. They’ve powered everything from authorship attribution to style transfer, text anonymization, and machine-text detection. But so far, most models are English-only 🇬🇧

November 6, 2025 at 5:42 AM

David Jurgens

@davidjurgens.bsky.social

Too many good answers to type, haha. The meta answer to most is we need to be explicit about what values we're prioritizing or deprioritizing when solving hard problems so that we can have more fruitful conversations about the tradeoffs and stakeholders

November 1, 2025 at 5:00 PM

David Jurgens

@davidjurgens.bsky.social

We have a recent preprint showing that models switch the moral decisions depending on what language you prompt with and that the reasoning itself differs arxiv.org/pdf/2509.21443.

arxiv.org

November 1, 2025 at 4:57 PM

David Jurgens

@davidjurgens.bsky.social

Eleventh question: how should we think about ownership and copyright for AI produced products?

October 29, 2025 at 9:21 PM

David Jurgens

@davidjurgens.bsky.social

Tenth question: what are the opinions on AI taking over artists' jobs

October 29, 2025 at 9:13 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news