Mourad Heddaya
mheddaya.bsky.social
Mourad Heddaya
@mheddaya.bsky.social
nlp phd student at uchicago cs
Analysis reveals different types of hallucinations:
- Simple factual errors
- Incorrect legal citations
- Misrepresentation of procedural history
- Mischaracterization of Court's reasoning

Fine-tuned smaller models tend to make more egregious errors than GPT-4.
May 1, 2025 at 7:25 PM
CaseSumm is a useful resource for long-context reasoning and legal research:
- Largest legal case summarization dataset
- 200+ years of Supreme Court cases
- "Ground truth" summaries written by Court attorneys and approved by Justices
- Variation in summary styles and compression rates over time
May 1, 2025 at 7:25 PM
Key findings:
1. A smaller fine-tuned LLM scores well on metrics but has more factual errors.
2. Experts prefer GPT-4 summaries—even over the “ground-truth” syllabuses.
3. ROUGE and similar metrics poorly reflect human preferences.
4. Even LLM-based evaluations still misalign with human judgment.
May 1, 2025 at 7:25 PM
🧑‍⚖️How well can LLMs summarize complex legal documents? And can we use LLMs to evaluate?

Excited to be in Albuquerque presenting our paper this afternoon at @naaclmeeting 2025!
May 1, 2025 at 7:25 PM
Even human annotators sometimes disagree on narrative presence, but fine-tuned LLMs mirror these natural disagreements more closely than larger models.

Our error analysis shows some mistakes arise from genuine interpretative ambiguity. Check out the last three examples here:
November 15, 2024 at 6:56 PM
Fine-tuning shines in teaching models to spot narratives, unlike in-context learning. GPT-4o struggles, often misclassifying non-narratives as narratives.
November 15, 2024 at 6:56 PM
This is a difficult hierarchical classification task, with many, somestimes semantically similar, classes.

We find that smaller fine-tuned LLMs outperform larger models like GPT-4o, while also offering better scalability and cost efficiency. But they also err differently.
November 15, 2024 at 6:56 PM
We define a causal micro-narrative as a sentence-level explanation of a target subject's cause(s) and/or effect(s).

As an application, we propose an ontology for inflation's causes/effects and create a large-scale dataset classifying sentences from U.S. news articles.
November 15, 2024 at 6:56 PM
How do everyday narratives reveal hidden cause-and-effect patterns that shape our beliefs and behaviors?

In our paper, we propose Causal Micro-Narratives to uncover narratives from real-world data. As a case study, we characterize the narratives about inflation in news.
November 15, 2024 at 6:56 PM