Jason Hartline
banner
jasonhartline.bsky.social
Jason Hartline
@jasonhartline.bsky.social
Professor at Northwestern CS. Economics, by courtesy. Study mechanism design, economics of algorithms, regulation of algorithms, AI and society. https://sites.northwestern.edu/hartline/
This is a really lovely result. Again, it's in this paper:

"Loss Minimization Yields Multicalibration for Large Neural Networks"
arxiv.org/abs/2304.09424

It seems the paper is not very well known and, hence, worthwhile to post about.
Loss Minimization Yields Multicalibration for Large Neural Networks
Multicalibration is a notion of fairness for predictors that requires them to provide calibrated predictions across a large set of protected groups. Multicalibration is known to be a distinct goal tha...
arxiv.org
November 12, 2025 at 11:39 PM
3. so most sizes n do not improve accuracy much by increasing to n+k+2 so they must have low bias (by 1).

4. minimizing loss with regularizer that penalizes for large sizes n will avoid the ones where there is significant improvement by making the network a little bigger (i.e., to n+k+2).
November 12, 2025 at 11:37 PM
2. consider the sequence of optimal losses as a function of network size. there can't be too many sizes n where the networks of size n+k+2 are significantly better. this is because the cumulative difference in losses of any sequence of networks is at most the maximum loss of 1.
November 12, 2025 at 11:23 PM
Proof outline:
1. neural networks are rich enough so that if there is expected multi-calibration bias for a size n network for points identified by a size k network the two networks can be combined with size n+k+2 to reduce squared loss by the squared bias.
November 12, 2025 at 11:20 PM
Back to "Loss Minimization Yields Multicalibration for Large Neural Networks". Main Theorem:

The minimizer of "squared loss minus a regularizer term that penalizes for neural network models by their size" is automatically multi-calibrated with respect to small neural networks.
November 12, 2025 at 11:18 PM
The calibration module was heavily based on the paper reading course co-taught with @yifanwu.bsky.social last year. Details of both courses are here:

- paper reading on calibration: sites.northwestern.edu/hartline/cs-...
- lecture based on data economics: sites.northwestern.edu/hartline/cs-...
November 12, 2025 at 11:13 PM
On the other hand, some esteemed scholars are finding the AI slop to be pretty good. bsky.app/profile/lanc...
ChatGPT pulse yesterday on its own accord saw that I was teaching basic circuit in my class, and it gave me formatted notes on a high-level intuitive approach to the switching lemma. Even had the class date, time and location.

Bowing to my AI overloads, I used the example in my lecture today.
November 8, 2025 at 7:33 PM