David Jurgens
@davidjurgens.bsky.social
Associate prof at @UMich in SI and CSE working in computational social science and natural language processing. PI of the Blablablab blablablab.si.umich.edu
See our #EMNLP2025 paper at aclanthology.org/2025.emnlp-m... for full details!
Structured Moral Reasoning in Language Models: A Value-Grounded Evaluation Framework
Mohna Chakraborty, Lu Wang, David Jurgens. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
aclanthology.org
November 6, 2025 at 5:47 AM
See our #EMNLP2025 paper at aclanthology.org/2025.emnlp-m... for full details!
This type of structured moral reasoning is not prompt engineering, it’s cognitive alignment and mirrors how we can inject instructions into prompts to guide how models solve problems. Our work is a step toward models that can explain not only what they decide, but why.
November 6, 2025 at 5:47 AM
This type of structured moral reasoning is not prompt engineering, it’s cognitive alignment and mirrors how we can inject instructions into prompts to guide how models solve problems. Our work is a step toward models that can explain not only what they decide, but why.
The takeaway: Moral alignment isn’t just about outputs—it’s about how we ask models to reason. By scaffolding ethical deliberation with value systems and reasoning structure, we can make model judgments more transparent, context-aware, and norm-sensitive.
November 6, 2025 at 5:47 AM
The takeaway: Moral alignment isn’t just about outputs—it’s about how we ask models to reason. By scaffolding ethical deliberation with value systems and reasoning structure, we can make model judgments more transparent, context-aware, and norm-sensitive.
Can smaller models learn this moral competence? Yes! Through reasoning-based distillation, we show that smaller LLMs inherit structured moral justifications from larger “teacher” models achieving better moral interpretability without extra inference cost.
November 6, 2025 at 5:47 AM
Can smaller models learn this moral competence? Yes! Through reasoning-based distillation, we show that smaller LLMs inherit structured moral justifications from larger “teacher” models achieving better moral interpretability without extra inference cost.
3. Best combination: Schwartz Values + Care Ethics, yielding coherent, norm-aligned judgments.
4. First-Principles Reasoning gives the largest accuracy and consistency gains.
4. First-Principles Reasoning gives the largest accuracy and consistency gains.
November 6, 2025 at 5:47 AM
3. Best combination: Schwartz Values + Care Ethics, yielding coherent, norm-aligned judgments.
4. First-Principles Reasoning gives the largest accuracy and consistency gains.
4. First-Principles Reasoning gives the largest accuracy and consistency gains.
Zero-shot Findings
1. Structured reasoning ≫ label-only prediction for aligning with human decisions
2. Prompt design matters more than model size. Even small models reason better when given explicit moral structure.
1. Structured reasoning ≫ label-only prediction for aligning with human decisions
2. Prompt design matters more than model size. Even small models reason better when given explicit moral structure.
November 6, 2025 at 5:47 AM
Zero-shot Findings
1. Structured reasoning ≫ label-only prediction for aligning with human decisions
2. Prompt design matters more than model size. Even small models reason better when given explicit moral structure.
1. Structured reasoning ≫ label-only prediction for aligning with human decisions
2. Prompt design matters more than model size. Even small models reason better when given explicit moral structure.
We tested 12 open-source LLMs across 4 moral-reasoning datasets, introducing a taxonomy of structured prompts that embed:
1️⃣ Value systems (e.g., Schwartz, Moral Foundations)
2️⃣ Ethical theories (e.g., Care Ethics, Utilitarianism)
3️⃣ Cognitive reasoning strategies (e.g., First-Principles)
1️⃣ Value systems (e.g., Schwartz, Moral Foundations)
2️⃣ Ethical theories (e.g., Care Ethics, Utilitarianism)
3️⃣ Cognitive reasoning strategies (e.g., First-Principles)
November 6, 2025 at 5:47 AM
We tested 12 open-source LLMs across 4 moral-reasoning datasets, introducing a taxonomy of structured prompts that embed:
1️⃣ Value systems (e.g., Schwartz, Moral Foundations)
2️⃣ Ethical theories (e.g., Care Ethics, Utilitarianism)
3️⃣ Cognitive reasoning strategies (e.g., First-Principles)
1️⃣ Value systems (e.g., Schwartz, Moral Foundations)
2️⃣ Ethical theories (e.g., Care Ethics, Utilitarianism)
3️⃣ Cognitive reasoning strategies (e.g., First-Principles)
We ask how we can inject explicit moral scaffolding to make models deliberate more like us.
November 6, 2025 at 5:47 AM
We ask how we can inject explicit moral scaffolding to make models deliberate more like us.
Humans reason about moral decisions through value trade-offs, ethical principles, and contextual sensitivity. However, most LLM moral responses rely on pretraining data or posttraining preferences, which can result in generic, biased, and normatively inconsistent decisions.
November 6, 2025 at 5:47 AM
Humans reason about moral decisions through value trade-offs, ethical principles, and contextual sensitivity. However, most LLM moral responses rely on pretraining data or posttraining preferences, which can result in generic, biased, and normatively inconsistent decisions.
If you're at #EMNLP2025, come by our poster 10:30am on Friday where Junghwan Kim will be presenting (but not on Bluesky yet). This is Haotian Zhang's first paper too (he's applying for PhD programs now). You can read more at aclanthology.org/2025.emnlp-m...
Leveraging Multilingual Training for Authorship Representation: Enhancing Generalization across Languages and Domains
Junghwan Kim, Haotian Zhang, David Jurgens. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
aclanthology.org
November 6, 2025 at 5:42 AM
If you're at #EMNLP2025, come by our poster 10:30am on Friday where Junghwan Kim will be presenting (but not on Bluesky yet). This is Haotian Zhang's first paper too (he's applying for PhD programs now). You can read more at aclanthology.org/2025.emnlp-m...
Our trained models offer state of the art performance for authorship attribution in most languages, including English, and are available on Huggingface huggingface.co/Blablablab
Blablablab (Blablablab)
Natural Language Processing and Computational Social Science
huggingface.co
November 6, 2025 at 5:42 AM
Our trained models offer state of the art performance for authorship attribution in most languages, including English, and are available on Huggingface huggingface.co/Blablablab
Multilingual training also helps cross-lingual generalization 🌏 A single model trained jointly across languages performs better even on unseen languages—we test on 13 heldout—and unseen domains. This transferability result shows that some elements of authorial style are common across languages!
November 6, 2025 at 5:42 AM
Multilingual training also helps cross-lingual generalization 🌏 A single model trained jointly across languages performs better even on unseen languages—we test on 13 heldout—and unseen domains. This transferability result shows that some elements of authorial style are common across languages!
We train on 4.5M authors across 36 languages and 13 domains—spanning 19 language families and 17 scripts. Our multilingual model beats monolingual baselines in 21 of 22 languages, improving Recall@8 by an average of +4.85%. Kazakh and Georgian gain up to +15.9%! 📈
November 6, 2025 at 5:42 AM
We train on 4.5M authors across 36 languages and 13 domains—spanning 19 language families and 17 scripts. Our multilingual model beats monolingual baselines in 21 of 22 languages, improving Recall@8 by an average of +4.85%. Kazakh and Georgian gain up to +15.9%! 📈
For both, no language-specific tools are required, letting them easily scale and make constrastive learning benefit from hard positives with few easy negatives.
November 6, 2025 at 5:42 AM
For both, no language-specific tools are required, letting them easily scale and make constrastive learning benefit from hard positives with few easy negatives.
💡 Two core innovations:
1️⃣ Probabilistic Content Masking (PCM) – masks topic-heavy words, forcing the model to focus on style, not subject matter.
2️⃣ Language-Aware Batching (LAB) – groups same-language samples to avoid weak cross-lingual negatives during contrastive learning.
1️⃣ Probabilistic Content Masking (PCM) – masks topic-heavy words, forcing the model to focus on style, not subject matter.
2️⃣ Language-Aware Batching (LAB) – groups same-language samples to avoid weak cross-lingual negatives during contrastive learning.
November 6, 2025 at 5:42 AM
💡 Two core innovations:
1️⃣ Probabilistic Content Masking (PCM) – masks topic-heavy words, forcing the model to focus on style, not subject matter.
2️⃣ Language-Aware Batching (LAB) – groups same-language samples to avoid weak cross-lingual negatives during contrastive learning.
1️⃣ Probabilistic Content Masking (PCM) – masks topic-heavy words, forcing the model to focus on style, not subject matter.
2️⃣ Language-Aware Batching (LAB) – groups same-language samples to avoid weak cross-lingual negatives during contrastive learning.
We ask:
➡️ Can multilingual training improve authorship attribution models for low-resource languages?
➡️ Can we build style embeddings that generalize across languages and domains—even unseen ones?
Spoiler: Yes. By a lot.
➡️ Can multilingual training improve authorship attribution models for low-resource languages?
➡️ Can we build style embeddings that generalize across languages and domains—even unseen ones?
Spoiler: Yes. By a lot.
November 6, 2025 at 5:42 AM
We ask:
➡️ Can multilingual training improve authorship attribution models for low-resource languages?
➡️ Can we build style embeddings that generalize across languages and domains—even unseen ones?
Spoiler: Yes. By a lot.
➡️ Can multilingual training improve authorship attribution models for low-resource languages?
➡️ Can we build style embeddings that generalize across languages and domains—even unseen ones?
Spoiler: Yes. By a lot.
Authorship representation (AR) models learn to capture an author’s style—not their content. They’ve powered everything from authorship attribution to style transfer, text anonymization, and machine-text detection. But so far, most models are English-only 🇬🇧
November 6, 2025 at 5:42 AM
Authorship representation (AR) models learn to capture an author’s style—not their content. They’ve powered everything from authorship attribution to style transfer, text anonymization, and machine-text detection. But so far, most models are English-only 🇬🇧
Too many good answers to type, haha. The meta answer to most is we need to be explicit about what values we're prioritizing or deprioritizing when solving hard problems so that we can have more fruitful conversations about the tradeoffs and stakeholders
November 1, 2025 at 5:00 PM
Too many good answers to type, haha. The meta answer to most is we need to be explicit about what values we're prioritizing or deprioritizing when solving hard problems so that we can have more fruitful conversations about the tradeoffs and stakeholders
We have a recent preprint showing that models switch the moral decisions depending on what language you prompt with and that the reasoning itself differs arxiv.org/pdf/2509.21443.
arxiv.org
November 1, 2025 at 4:57 PM
We have a recent preprint showing that models switch the moral decisions depending on what language you prompt with and that the reasoning itself differs arxiv.org/pdf/2509.21443.
Eleventh question: how should we think about ownership and copyright for AI produced products?
October 29, 2025 at 9:21 PM
Eleventh question: how should we think about ownership and copyright for AI produced products?
Tenth question: what are the opinions on AI taking over artists' jobs
October 29, 2025 at 9:13 PM
Tenth question: what are the opinions on AI taking over artists' jobs