jobs.ashbyhq.com/cohere/3f797...
jobs.ashbyhq.com/cohere/3f797...
#ICLR2024 as a Spotlight! See you in Vienna 🇦🇹! Thanks to @nsaphra.bsky.social, Pradeep Dasigi, Hao Peng and @ai2.bsky.social
Vision experiments, more discussion and visuals coming soon to the camera ready!
Can we train for flat minima with less catastrophic OOD forgetting? We propose Trust Region Aware Minimization for smoothness in parameters+representations.
TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng
#ICLR2024 as a Spotlight! See you in Vienna 🇦🇹! Thanks to @nsaphra.bsky.social, Pradeep Dasigi, Hao Peng and @ai2.bsky.social
Vision experiments, more discussion and visuals coming soon to the camera ready!
To appear at EMNLP 2023: arxiv.org/abs/2310.07715
Can we train for flat minima with less catastrophic OOD forgetting? We propose Trust Region Aware Minimization for smoothness in parameters+representations.
TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng
Can we train for flat minima with less catastrophic OOD forgetting? We propose Trust Region Aware Minimization for smoothness in parameters+representations.
TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng