Lightnews — Scholar-powered news

Tom Sherborne

@tomsherborne.bsky.social

We are hiring @cohere.com for an Agent Infrastructure Engineer! If you want to work on building the next generation of agent models for #RAG, #ToolUse #Code, #Reasoning and more then apply here. DM me if you have any Qs.

jobs.ashbyhq.com/cohere/3f797...

Member of Technical Staff, Agent Infrastructure Engineer

At Cohere, we have one of the highest compute-to-engineers ratios in the world. We do not delineate strongly between engineering and research: everyone contributes to writing production code and condu...

jobs.ashbyhq.com

February 21, 2025 at 11:31 AM

Tom Sherborne

@tomsherborne.bsky.social

I’ll be at @neuripsconf.bsky.social all next week! Find me mostly at the @cohere.com booth / DM me to talk code / post-training / life at Cohere 🇨🇦

December 3, 2024 at 3:17 PM

Tom Sherborne

@tomsherborne.bsky.social

My PhD thesis "Modelling Cross-lingual Transfer For Semantic Parsing" is finally submitted! 🎉🎉🎉

January 31, 2024 at 9:14 PM

Tom Sherborne

@tomsherborne.bsky.social

TRAM is accepted to
#ICLR2024 as a Spotlight! See you in Vienna 🇦🇹! Thanks to @nsaphra.bsky.social, Pradeep Dasigi, Hao Peng and @ai2.bsky.social

Vision experiments, more discussion and visuals coming soon to the camera ready!

Tom Sherborne @tomsherborne.bsky.social · Oct 11

🚨 new paper 🚨

Can we train for flat minima with less catastrophic OOD forgetting?   We propose Trust Region Aware Minimization for smoothness in parameters+representations.

TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng

January 16, 2024 at 3:36 PM

Reposted by Tom Sherborne

Clara Na

@clarana.bsky.social

Really excited about this one and had such a blast working with @siree.sh @abertsch.bsky.social @davidthewid.bsky.social @strubell.bsky.social! Please read our paper and reach out with any questions, we'd love to chat! See y'all in Singapore :)

Sireesh Gururaja @siree.sh · Oct 12

We all know that “recently large language models have”, “large language models are”, and “large language models can.” But *why* LLMs? How did we get here? (where is “here”?) What forces are shaping NLP, and how recent are they, actually?

To appear at EMNLP 2023: arxiv.org/abs/2310.07715

Screenshot of paper title: "To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing"

October 12, 2023 at 3:38 PM

Tom Sherborne

@tomsherborne.bsky.social

🚨 new paper 🚨

Can we train for flat minima with less catastrophic OOD forgetting?   We propose Trust Region Aware Minimization for smoothness in parameters+representations.

TL;DR representations matter just as much!
arxiv.org/abs/2310.03646 w/
@nsaphra.bsky.social Pradeep Dasigi + Hao Peng

October 11, 2023 at 9:31 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news