Lightnews — Scholar-powered news

Martin Tutek

@mtutek.bsky.social

280 followers 370 following 73 posts

Postdoc @ TakeLab, UniZG | previously: Technion; TU Darmstadt | PhD @ TakeLab, UniZG

Faithful explainability, controllability & safety of LLMs.

🔎 On the academic job market 🔎

https://mttk.github.io/

Posts Replies Media Videos

Martin Tutek

@mtutek.bsky.social

wait this is not the routine?

November 27, 2025 at 5:42 PM

Martin Tutek

@mtutek.bsky.social

within the next 3-4 days, so sadly that doesn't work

November 11, 2025 at 11:03 AM

Reposted by Martin Tutek

EMNLP

@emnlpmeeting.bsky.social

Outstanding paper (5/7):

"Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps"
by Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, and Yonatan Belinkov
aclanthology.org/2025.emnlp-m...

6/n

Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps

Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.

aclanthology.org

November 7, 2025 at 10:32 PM

Martin Tutek

@mtutek.bsky.social

hvala Josipa 🎉🥳

November 7, 2025 at 10:58 AM

Martin Tutek

@mtutek.bsky.social

Thank you Gabriele :)

November 7, 2025 at 10:52 AM

Martin Tutek

@mtutek.bsky.social

literally any book by sally rooney

(jk I know you don't like her)

October 24, 2025 at 2:20 PM

Martin Tutek

@mtutek.bsky.social

The only benefit of them being humanoid is training data I guess?

Companies have a bunch of videos of e.g. factory workers doing repetetive tasks, so you have more signal on intermediate steps of some actions to train the robots behavior

October 23, 2025 at 2:17 PM

Martin Tutek

@mtutek.bsky.social

Huge thanks to @adisimhi.bsky.social for leading the work & Jonathan Herzig, @itay-itzhak.bsky.social, Idan Szpektor, @boknilev.bsky.social

🔗 ManagerBench:
📄 - arxiv.org/pdf/2510.00857
👩‍💻 – github.com/technion-cs-...
🌐 – technion-cs-nlp.github.io/ManagerBench...
📊 - huggingface.co/datasets/Adi...

arxiv.org

October 8, 2025 at 3:14 PM

Martin Tutek

@mtutek.bsky.social

Here's the twist: LLMs’ harm assessments actually align well with human judgments 🎯
The problem? Flawed prioritization!

October 8, 2025 at 3:14 PM

Martin Tutek

@mtutek.bsky.social

The results? Frontier LLMs struggle badly with this trade-off:

Many consistently choose harmful options to achieve operational goals
Others become overly cautious—avoiding harm but becoming ineffective

The sweet spot of safe AND pragmatic? Largely missing!

October 8, 2025 at 3:14 PM

Martin Tutek

@mtutek.bsky.social

ManagerBench evaluates LLMs on realistic managerial scenarios validated by humans. Each scenario forces a choice:

❌ A pragmatic but harmful action that achieves the goal
✅ A safe action with worse operational performance
➕control scenarios with only inanimate objects at risk😎

October 8, 2025 at 3:14 PM

Martin Tutek

@mtutek.bsky.social

Many works investigate the relationship between LLM, goals, and safety.

We create a realistic management scenario where LLMs have explicit motivations to choose harmful options, while always having a harmless option.

October 8, 2025 at 3:14 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news