Tom Sherborne
tomsherborne.bsky.social
Tom Sherborne
@tomsherborne.bsky.social
MTS @ Cohere on code. Views not my employer’s.
We are hiring @cohere.com for an Agent Infrastructure Engineer! If you want to work on building the next generation of agent models for #RAG, #ToolUse #Code, #Reasoning and more then apply here. DM me if you have any Qs.

jobs.ashbyhq.com/cohere/3f797...
Member of Technical Staff, Agent Infrastructure Engineer
At Cohere, we have one of the highest compute-to-engineers ratios in the world. We do not delineate strongly between engineering and research: everyone contributes to writing production code and condu...
jobs.ashbyhq.com
February 21, 2025 at 11:31 AM
I’ll be at @neuripsconf.bsky.social all next week! Find me mostly at the @cohere.com booth / DM me to talk code / post-training / life at Cohere 🇨🇦
December 3, 2024 at 3:17 PM
My PhD thesis "Modelling Cross-lingual Transfer For Semantic Parsing" is finally submitted! 🎉🎉🎉
January 31, 2024 at 9:14 PM
TRAM is accepted to
#ICLR2024 as a Spotlight! See you in Vienna 🇦🇹! Thanks to @nsaphra.bsky.social, Pradeep Dasigi, Hao Peng and @ai2.bsky.social

Vision experiments, more discussion and visuals coming soon to the camera ready!
🚨 new paper 🚨

Can we train for flat minima with less catastrophic OOD forgetting? 

We propose Trust Region Aware Minimization for smoothness in parameters+representations.

TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng
January 16, 2024 at 3:36 PM
Reposted by Tom Sherborne
Really excited about this one and had such a blast working with @siree.sh @abertsch.bsky.social @davidthewid.bsky.social @strubell.bsky.social! Please read our paper and reach out with any questions, we'd love to chat! See y'all in Singapore :)
We all know that “recently large language models have”, “large language models are”, and “large language models can.” But *why* LLMs? How did we get here? (where is “here”?) What forces are shaping NLP, and how recent are they, actually?

To appear at EMNLP 2023: arxiv.org/abs/2310.07715
October 12, 2023 at 3:38 PM
🚨 new paper 🚨

Can we train for flat minima with less catastrophic OOD forgetting? 

We propose Trust Region Aware Minimization for smoothness in parameters+representations.

TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng
October 11, 2023 at 9:31 AM