Aaron Roth
@aaroth.bsky.social
Professor at Penn, Amazon Scholar at AWS. Interested in machine learning, uncertainty quantification, game theory, privacy, fairness, and most of the intersections therein
My own STOC submission. Really did use elegant properties of linear regression that I didn't know about until embarrassingly recently!
November 5, 2025 at 1:32 AM
My own STOC submission. Really did use elegant properties of linear regression that I didn't know about until embarrassingly recently!
i.e. in this framework, you get the decision theoretic benefits of full calibration at an extremely low (and computationally tractable) level of this hierarchy. The paper is here: arxiv.org/abs/2510.23471 and is joint with Shayan Kiyani, Hamed Hassani, and George Pappas.
Robust Decision Making with Partially Calibrated Forecasts
Calibration has emerged as a foundational goal in ``trustworthy machine learning'', in part because of its strong decision theoretic semantics. Independent of the underlying distribution, and independ...
arxiv.org
October 30, 2025 at 7:02 PM
i.e. in this framework, you get the decision theoretic benefits of full calibration at an extremely low (and computationally tractable) level of this hierarchy. The paper is here: arxiv.org/abs/2510.23471 and is joint with Shayan Kiyani, Hamed Hassani, and George Pappas.
What lies in between? Maybe infinite hierarchy of ever less conservative decision rules as we add to H. But one surprise that we find is that as soon as H contains the decision calibration tests (just one for each action), the optimal decision rule collapses to best response.
October 30, 2025 at 7:02 PM
What lies in between? Maybe infinite hierarchy of ever less conservative decision rules as we add to H. But one surprise that we find is that as soon as H contains the decision calibration tests (just one for each action), the optimal decision rule collapses to best response.
We can interpolate between full calibration and no information: optimize for the worst distribution that is consistent with the H-calibration guarantees of f. When H is empty, we recover the minimax safety strategy. When H is all functions, we recover the best-response rule.
October 30, 2025 at 7:02 PM
We can interpolate between full calibration and no information: optimize for the worst distribution that is consistent with the H-calibration guarantees of f. When H is empty, we recover the minimax safety strategy. When H is all functions, we recover the best-response rule.
Full calibration is hard and so rarely satisfied. But predictions aren't useless either. Maybe the forecaster is partially calibrated in that for some class of tests H={h1,...,hk}, we know that |E[(f(x)-y)*h(f(x))]| <= eps. Most relaxations of calibration have this format.
October 30, 2025 at 7:02 PM
Full calibration is hard and so rarely satisfied. But predictions aren't useless either. Maybe the forecaster is partially calibrated in that for some class of tests H={h1,...,hk}, we know that |E[(f(x)-y)*h(f(x))]| <= eps. Most relaxations of calibration have this format.
If the forecasts have no bearing on the outcome at all, then you should ignore them, and you might conservatively play your minimax strategy: argmax_a min_o u(a,o). The forecasts don't tell you how to do anything better. But generally we aren't in either of these two cases.
October 30, 2025 at 7:02 PM
If the forecasts have no bearing on the outcome at all, then you should ignore them, and you might conservatively play your minimax strategy: argmax_a min_o u(a,o). The forecasts don't tell you how to do anything better. But generally we aren't in either of these two cases.
(but that is not the same thing I think as having "no understanding" and only be extruding text)
October 21, 2025 at 7:58 PM
(but that is not the same thing I think as having "no understanding" and only be extruding text)
I totally agree they are worse at out of distribution kinds of examples, and cannot learn/improve on these tasks the same way people can. They seem to have modest understanding of a huge collection of things rather than deep understanding in anything, and no ability to learn.
October 21, 2025 at 7:58 PM
I totally agree they are worse at out of distribution kinds of examples, and cannot learn/improve on these tasks the same way people can. They seem to have modest understanding of a huge collection of things rather than deep understanding in anything, and no ability to learn.
I do not --- but I have been using them quite a bit to draft mathematics using coding tools like Cursor and Windsurf, and I have found them useful. It is very much human in the loop --- but also very useful in my experience.
October 21, 2025 at 7:54 PM
I do not --- but I have been using them quite a bit to draft mathematics using coding tools like Cursor and Windsurf, and I have found them useful. It is very much human in the loop --- but also very useful in my experience.
But now that they write working code, and can be useful assistants in mathematical research (including my own) I don't see how it is defensible to say that all value/understanding comes from interpretation on the part of the human user. I'd be interested in hearing the best version of the argument.
October 21, 2025 at 3:33 PM
But now that they write working code, and can be useful assistants in mathematical research (including my own) I don't see how it is defensible to say that all value/understanding comes from interpretation on the part of the human user. I'd be interested in hearing the best version of the argument.
I have to say, because of my upbringing in computer science (and in particular TCS), I am partial to the functionalist argument. When LLMs were just chatting and writing poems, I could believe that we were reading more into them than was there because of our anthopomorphic biases.
October 21, 2025 at 3:33 PM
I have to say, because of my upbringing in computer science (and in particular TCS), I am partial to the functionalist argument. When LLMs were just chatting and writing poems, I could believe that we were reading more into them than was there because of our anthopomorphic biases.