Martin Tutek
mtutek.bsky.social
Martin Tutek
@mtutek.bsky.social
Postdoc @ TakeLab, UniZG | previously: Technion; TU Darmstadt | PhD @ TakeLab, UniZG

Faithful explainability, controllability & safety of LLMs.

🔎 On the academic job market 🔎

https://mttk.github.io/
wait this is not the routine?
November 27, 2025 at 5:42 PM
within the next 3-4 days, so sadly that doesn't work
November 11, 2025 at 11:03 AM
Reposted by Martin Tutek
Outstanding paper (5/7):

"Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps"
by Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, and Yonatan Belinkov
aclanthology.org/2025.emnlp-m...

6/n
Measuring Chain of Thought Faithfulness by Unlearning Reasoning Steps
Martin Tutek, Fateme Hashemi Chaleshtori, Ana Marasovic, Yonatan Belinkov. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025.
aclanthology.org
November 7, 2025 at 10:32 PM
hvala Josipa 🎉🥳
November 7, 2025 at 10:58 AM
Thank you Gabriele :)
November 7, 2025 at 10:52 AM
literally any book by sally rooney

(jk I know you don't like her)
October 24, 2025 at 2:20 PM
The only benefit of them being humanoid is training data I guess?

Companies have a bunch of videos of e.g. factory workers doing repetetive tasks, so you have more signal on intermediate steps of some actions to train the robots behavior
October 23, 2025 at 2:17 PM
Huge thanks to @adisimhi.bsky.social for leading the work & Jonathan Herzig, @itay-itzhak.bsky.social, Idan Szpektor, @boknilev.bsky.social

🔗 ManagerBench:
📄 - arxiv.org/pdf/2510.00857
👩‍💻 – github.com/technion-cs-...
🌐 – technion-cs-nlp.github.io/ManagerBench...
📊 - huggingface.co/datasets/Adi...
arxiv.org
October 8, 2025 at 3:14 PM
Here's the twist: LLMs’ harm assessments actually align well with human judgments 🎯
The problem? Flawed prioritization!
October 8, 2025 at 3:14 PM
The results? Frontier LLMs struggle badly with this trade-off:

Many consistently choose harmful options to achieve operational goals
Others become overly cautious—avoiding harm but becoming ineffective

The sweet spot of safe AND pragmatic? Largely missing!
October 8, 2025 at 3:14 PM
ManagerBench evaluates LLMs on realistic managerial scenarios validated by humans. Each scenario forces a choice:

❌ A pragmatic but harmful action that achieves the goal
✅ A safe action with worse operational performance
➕control scenarios with only inanimate objects at risk😎
October 8, 2025 at 3:14 PM
Many works investigate the relationship between LLM, goals, and safety.

We create a realistic management scenario where LLMs have explicit motivations to choose harmful options, while always having a harmless option.
October 8, 2025 at 3:14 PM