Lightnews — Scholar-powered news

Qingcheng Zeng

@qcznlp.bsky.social

72 followers 210 following 11 posts

I do research in social computing and LLMs at Northwestern with @robvoigt.bsky.social and Kaize Ding.

Posts Replies Media Videos

Qingcheng Zeng

@qcznlp.bsky.social

4️⃣ Good Intentions Beyond ACL: Who Does NLP for Social Good, and Where?

The first jump into Science of Science! We systematically investigated the NLP4SG landscape and quantified the proportion of work addressing social good concerns both within and beyond the ACL community. Preprint coming soon!

August 20, 2025 at 8:47 PM

Qingcheng Zeng

@qcznlp.bsky.social

3️⃣ MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation

By far, the most comprehensive multilingual benchmark for evaluating LLMs. Qwen 3 2507 is using this benchmark to evaluate multilingual ability!

Paper 3️⃣: arxiv.org/abs/2503.10497

MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation

Existing large language model (LLM) evaluation benchmarks primarily focus on English, while current multilingual tasks lack parallel questions that specifically assess cross-linguistic reasoning abili...

arxiv.org

August 20, 2025 at 8:47 PM

Qingcheng Zeng

@qcznlp.bsky.social

(3) Instruct models show much higher refusal rates than reasoning models. And reasoning models only show minimal accuracy in additional attempts.
(4) Thinking with images helps SO much in VLMs' calibration!

Paper1️⃣: arxiv.org/abs/2504.06564
Paper2️⃣: arxiv.org/abs/2505.20236

Thinking Out Loud: Do Reasoning Models Know When They're Right?

Large reasoning models (LRMs) have recently demonstrated impressive capabilities in complex reasoning tasks by leveraging increased test-time computation and exhibiting behaviors reminiscent of human-...

arxiv.org

August 20, 2025 at 8:47 PM

Qingcheng Zeng

@qcznlp.bsky.social

...whether reasoning models or vision language models express their confidence in a calibrated manner. Our findings are:
(1) SFT reasoning models usually lead to better calibration in in-distribution settings, and worse calibration in OOD settings.
(2) RL could help improve(recover) a bit.
...

August 20, 2025 at 8:47 PM

Qingcheng Zeng

@qcznlp.bsky.social

Thanks for reading!! Any feedback will be greatly appreciated if you happen to have🫡

July 21, 2025 at 3:16 AM

Qingcheng Zeng

@qcznlp.bsky.social

Fascinating work! If you're open to talk, I’d love to chat sometime about the broader potential of LLMs in social science.

May 7, 2025 at 8:04 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news