David Holzmüller
banner
dholzmueller.bsky.social
David Holzmüller
@dholzmueller.bsky.social
Postdoc in machine learning with Francis Bach &
@GaelVaroquaux: neural networks, tabular data, uncertainty, active learning, atomistic ML, learning theory.
https://dholzmueller.github.io
Thanks!
July 30, 2025 at 12:35 PM
Solution write-up with additional insights: kaggle.com/competitions...

🥈2nd place used stacking with diverse models
🥇1st place found a larger dataset
Prediction interval competition II: House price
Create a regression model for a house sale price having the narrowest overall prediction intervals
kaggle.com
July 29, 2025 at 11:10 AM
Is it because mathematicians think in terms of the number of assumptions that are satisfied, while physicists think in terms of the number of things that satisfy them?
July 23, 2025 at 9:04 PM
Reposted by David Holzmüller
Missed the school? We have uploaded recordings of most talks to our YouTube Channel www.youtube.com/@AutoML_org 🙌
AutoML Freiburg Hannover Tübingen
This channel features videos about automated machine learning (AutoML) from the AutoML groups at the University of Freiburg, Leibniz University Hannover and University of Tübingen. Common topics inclu...
www.youtube.com
June 20, 2025 at 3:36 PM
Links:
www.kaggle.com/competitions...
www.kaggle.com/competitions...
Link to the repo: github.com/dholzmueller...

PS: The newest pytabkit version now includes multiquantile regression for RealMLP and a few other improvements.

bsky.app/profile/dhol...
Can deep learning finally compete with boosted trees on tabular data? 🌲
In our NeurIPS 2024 paper, we introduce RealMLP, a NN with improvements in all areas and meta-learned default parameters.
Some insights about RealMLP and other models on large benchmarks (>200 datasets): 🧵
March 10, 2025 at 3:53 PM
The benchmark is limited to classification with AUC as a metric, which is one of RealMLP’s weaker points. Datasets are from the CC-18 benchmark, and the benchmark uses nested cross-validation unlike many other benchmarks. Link: arxiv.org/abs/2402.039...
Is Deep Learning finally better than Decision Trees on Tabular Data?
Tabular data is a ubiquitous data modality due to its versatility and ease of use in many real-world applications. The predominant heuristics for handling classification tasks on tabular data rely on ...
arxiv.org
March 4, 2025 at 2:25 PM
What about work on adaptive learning rates (in the sense of convergence rates, not step sizes) that studies methods with hyperparameter optimization on a holdout set to achieve optimal/good convergence rates simultaneously for different classes of functions? E.g.
projecteuclid.org/journals/ann...
February 12, 2025 at 7:31 AM
By the way, I think an intercept in this case is necessary because the logistic regression model does not have an intercept. For more realistic models that can learn an intercept themselves, I think an intercept for TS is probably not very important.
February 8, 2025 at 11:21 AM