Lightnews — Scholar-powered news

Niklas Stoehr

@niklasstoehr.bsky.social

1.1K followers 220 following 6 posts

Gemini Post-Training ⚫️ Research Scientist at Google DeepMind ⚫️ PhD from ETH Zurich

Posts Replies Media Videos

Niklas Stoehr

@niklasstoehr.bsky.social

⚖️ Measuring Scalar Constructs in Social Science with LLMs

with rising (and established) stars in Computational Social Science

@haukelicht.bsky.social
@rupak-s.bsky.social
@patrickwu.bsky.social
@pranavgoel.bsky.social
@elliottash.bsky.social
@alexanderhoyle.bsky.social

arxiv.org/abs/2509.03116

November 17, 2025 at 9:29 AM

Reposted by Niklas Stoehr

Alexander Hoyle

@alexanderhoyle.bsky.social

Paper: arxiv.org/abs/2509.03116

Code: github.com/haukelicht/s...

With:
@haukelicht.bsky.social *
@rupak-s.bsky.social *
@patrickwu.bsky.social
@pranavgoel.bsky.social
@niklasstoehr.bsky.social
@elliottash.bsky.social

github.com

October 28, 2025 at 6:20 AM

Reposted by Niklas Stoehr

Alexander Hoyle

@alexanderhoyle.bsky.social

[corrected link]

LLMs are often used for text annotation in social science. In some cases, this involves placing text items on a scale: eg, 1 for liberal and 9 for conservative

There are a few ways to handle this task. Which work best? Our new EMNLP paper has some answers🧵
arxiv.org/abs/2509.03116

A diagram illustrating pointwise scoring with a large language model (LLM). At the top is a text box containing instructions: 'You will see the text of a political advertisement about a candidate. Rate it on a scale ranging from 1 to 9, where 1 indicates a positive view of the candidate and 9 indicates a negative view of the candidate.' Below this is a green text box containing an example ad text: 'Joe Biden is going to eat your grandchildren for dinner.' An arrow points down from this text to an illustration of a computer with 'LLM' displayed on its monitor. Finally, an arrow points from the computer down to the number '9' in large teal text, representing the LLM's scoring output. This diagram demonstrates how an LLM directly assigns a numerical score to text based on given criteria

October 28, 2025 at 6:23 AM

Reposted by Niklas Stoehr

Alexander Hoyle

@alexanderhoyle.bsky.social

Evaluating topic models (and document clustering methods) is hard. In fact, since our paper critiquing standard evaluation practices four years ago, there hasn't been a good replacement metric

That ends today (we hope)! Our new ACL paper introduces an LLM-based evaluation protocol 🧵

Screenshot of first page of paper. It is here: https://arxiv.org/pdf/2507.00828

Abstract: Topic model and document-clustering evaluations either use automated metrics that align poorly with human preferences or require expert labels that are intractable to scale. We design a scalable human evaluation protocol and a corresponding automated approximation that reflect practitioners' real-world usage of models. Annotators -- or an LLM-based proxy -- review text items assigned to a topic or cluster, infer a category for the group, then apply that category to other documents. Using this protocol, we collect extensive crowdworker annotations of outputs from a diverse set of topic models on two datasets. We then use these annotations to validate automated proxies, finding that the best LLM proxies are statistically indistinguishable from a human annotator and can therefore serve as a reasonable substitute in automated evaluations

July 8, 2025 at 12:40 PM

Niklas Stoehr

@niklasstoehr.bsky.social

🎓 I recently defended my PhD and moved from one dream team at ETH Zurich to another at DeepMind—a huge thank you to the many people who have supported me along the way!

June 11, 2025 at 9:39 AM

Reposted by Niklas Stoehr

Shauli Ravfogel

@shauli.bsky.social

Our paper "A Practical Method for Generating String Counterfactuals" has been accepted to the findings of NAACL 2025! a joint work with @matan-avitan.bsky.social , @yoavgo.bsky.social and Ryan Cotterell. We propose "Intervention Lens", a technique to explain intervention in natural language. (1/6)

February 12, 2025 at 3:19 PM

Reposted by Niklas Stoehr

Paul Röttger @ EMNLP

@paul-rottger.bsky.social

Are LLMs biased when they write about political issues?

We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before.

Long 🧵with spicy results 👇

February 13, 2025 at 2:08 PM

Reposted by Niklas Stoehr

Julian Minder

@jkminder.bsky.social

Can we understand and control how language models balance context and prior knowledge? Our latest paper shows it’s all about a 1D knob! 🎛️
arxiv.org/abs/2411.07404

Co-led with
@kevdududu.bsky.social - @niklasstoehr.bsky.social , Giovanni Monea, @wendlerc.bsky.social, Robert West & Ryan Cotterell.

November 22, 2024 at 3:49 PM

Reposted by Niklas Stoehr

Lucy Li

@lucy3.bsky.social

mech interp: bsky.app/starter-pack...
women in nlp: bsky.app/starter-pack...
nlp #1: bsky.app/starter-pack...
nlp #2: bsky.app/starter-pack...
ml/data/tech: bsky.app/starter-pack...
robotics & ai: bsky.app/starter-pack...

November 19, 2024 at 7:23 PM

Reposted by Niklas Stoehr

Sweta Karlekar

@swetakar.bsky.social

If you’re interested in mechanistic interpretability, I just found this starter pack and wanted to boost it (thanks for creating it @butanium.bsky.social !). Excited to have a mech interp community on bluesky 🎉

go.bsky.app/LisK3CP

November 19, 2024 at 12:28 AM

Reposted by Niklas Stoehr

Giuliano Formisano

@giulianoformisano.bsky.social

Just launched a Political Comm/NLP/Text-as-Data Starter Pack. 🦋🤗

Join us and/or drop a message to be added!

go.bsky.app/39MWTjg #starterpack #polsci

November 18, 2024 at 3:01 PM

Reposted by Niklas Stoehr

Vilém Zouhar #EMNLP

@zouharvi.bsky.social

Trying to bring ML/NLP/etal people from ETH Zürich together. Ping me to add you. 🙂
bsky.app/starter-pack...

November 18, 2024 at 10:51 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news