✅ Theoretical guarantees for nonlinear meta-learning
✅ Explains when and how aggregation helps
✅ Connects RKHS regression, subspace estimation & meta-learning
Co-led with Zhu Li 🙌, with invaluable support from @arthurgretton.bsky.social, Samory Kpotufe.
✅ Theoretical guarantees for nonlinear meta-learning
✅ Explains when and how aggregation helps
✅ Connects RKHS regression, subspace estimation & meta-learning
Co-led with Zhu Li 🙌, with invaluable support from @arthurgretton.bsky.social, Samory Kpotufe.
Bonus: for linear kernels, our results recover known linear meta-learning rates.
Bonus: for linear kernels, our results recover known linear meta-learning rates.
Key idea💡: Instead of learning each task well, under-regularise per-task estimators to better estimate the shared subspace in the RKHS.
Even though each task is noisy, their span reveals the structure we care about.
Bias-variance tradeoff in action.
Key idea💡: Instead of learning each task well, under-regularise per-task estimators to better estimate the shared subspace in the RKHS.
Even though each task is noisy, their span reveals the structure we care about.
Bias-variance tradeoff in action.
Can we still estimate this shared representation efficiently — and learn new tasks fast?
Can we still estimate this shared representation efficiently — and learn new tasks fast?
Then: we can show improved learning rates as the number of tasks increases.
But reality is nonlinear. What then?
Then: we can show improved learning rates as the number of tasks increases.
But reality is nonlinear. What then?
In practice (e.g. with neural nets), this usually means learning a shared representation across tasks — so we can train quickly on unseen ones.
But: what’s the theory behind this? 🤔
In practice (e.g. with neural nets), this usually means learning a shared representation across tasks — so we can train quickly on unseen ones.
But: what’s the theory behind this? 🤔