nishantsubramani.github.io
Paper: aclanthology.org/2025.naacl-l...
Code (coming soon): github.com/microsoft/mi...
🧵/🧵
Paper: aclanthology.org/2025.naacl-l...
Code (coming soon): github.com/microsoft/mi...
🧵/🧵
🎯 - significantly beats baselines on expected tool-calling utility, especially in high risk scenarios
✅ - matches expected calibration error of baselines
✅ - is sample efficient
✅ - generalizes zeroshot to unseen tools
5/🧵
🎯 - significantly beats baselines on expected tool-calling utility, especially in high risk scenarios
✅ - matches expected calibration error of baselines
✅ - is sample efficient
✅ - generalizes zeroshot to unseen tools
5/🧵
We develop a new metric expected tool-calling utility 🛠️to measure the utility of deciding whether or not to execute a tool call via a confidence score!
4/🧵
We develop a new metric expected tool-calling utility 🛠️to measure the utility of deciding whether or not to execute a tool call via a confidence score!
4/🧵
1️⃣ decode from each intermediate layer of an LM
2️⃣ compute similarity scores between each layer’s generation and the final output.
3️⃣ train a probabilistic classifier on these features
3/🧵
1️⃣ decode from each intermediate layer of an LM
2️⃣ compute similarity scores between each layer’s generation and the final output.
3️⃣ train a probabilistic classifier on these features
3/🧵
2️⃣ Language models are poorly calibrated
🤔 Can we use model internals to better calibrate language models to make tool-using agents safer and more useful?
2/🧵
2️⃣ Language models are poorly calibrated
🤔 Can we use model internals to better calibrate language models to make tool-using agents safer and more useful?
2/🧵