Philipp Mondorf
banner
pmondorf.bsky.social
Philipp Mondorf
@pmondorf.bsky.social
PhD student @MaiNLP (Munich AI & NLP lab), @LMU.
Working on reasoning in large language models.
👥 @veraneplenbroek.bsky.social, Sandro Pezelle, @barbaraplank.bsky.social, @davidschlangen.bsky.social, Alessandro Suglia, @akskuchi.bsky.social, @ecekt.bsky.social, and @alberto-testoni.bsky.social.
📍Poster Session 2 — Hall 4/5, 11:00–12:30, Monday, July 28.

#MaiNLP #MCML #NLProc
July 18, 2025 at 10:19 AM
👥 This work is the result of a wonderful collaboration involving 20 researchers from 11 different universities.
July 18, 2025 at 10:19 AM
🔎Based on evaluations across 11 recent LLMs, we find that model judgments should be used with care, as they exhibit notable variability depending on the task and samples being evaluated. We argue that LLMs should be carefully validated against human judgments before being used as evaluators.
July 18, 2025 at 10:19 AM
🔎 In this work, we study whether LLM judgments can be reliably used as proxies for human judgments. We introduce JUDGE-BENCH, an extensive collection of 20 datasets with human annotations covering a variety of NLP tasks.
July 18, 2025 at 10:19 AM
📄 [ACL 2025 main] LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks (doi.org/10.48550/arX...)
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
There is an increasing trend towards evaluating NLP models with LLMs instead of human judgments, raising questions about the validity of these evaluations, as well as their reproducibility in the case...
doi.org
July 18, 2025 at 10:19 AM
👥 Huge thanks to my collaborators and co-authors, Sondre Wold and @barbaraplank.bsky.social
📍Poster Session 7 — Hall 4/5, 10:30–12:00, Tuesday, July 29.
July 18, 2025 at 10:19 AM
🔎 Moreover, we show that these circuits can be reused and combined through set operations to represent more complex functional capabilities of the model. For more information, check out the paper!
July 18, 2025 at 10:19 AM
🔎 In this work, we study the relationship between transformer circuits identified for highly compositional and functionally related tasks. We find that functionally similar circuits exhibit both notable node overlap and cross-task faithfulness.
July 18, 2025 at 10:19 AM
📄 [ACL 2025 main] Circuit compositions: Exploring Modular Structures in Transformer-Based Language Models (doi.org/10.48550/arX...)
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
A fundamental question in interpretability research is to what extent neural networks, particularly language models, implement reusable functions through subnetworks that can be composed to perform mo...
doi.org
July 18, 2025 at 10:19 AM
🙋‍♂️
November 25, 2024 at 6:03 PM