Lightnews — Scholar-powered news

Jonathan Hayase

@jon.jon.ke

190 followers 39 following 1 posts

5th year PhD student at UW CSE, working on Security and Privacy for ML

Posts Replies Media Videos

Jonathan Hayase

@jon.jon.ke

Tokenizers govern the allocation of computation. It's a waste to spend a whole token of compute predicting the "way" in "By the way". SuperBPE redirects that compute to predict more difficult tokens, leading to wins on downstream tasks!

Alisa Liu @alisawuffles.bsky.social · Mar 21

We created SuperBPE🚀, a *superword* tokenizer that includes tokens spanning multiple words.

When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵

Segmentation of the sentence "By the way, I am a fan of the Milky Way" under BPE and SuperBPE.

March 21, 2025 at 6:31 PM

Reposted by Jonathan Hayase

Alisa Liu

@alisawuffles.bsky.social

excited to be at #NeurIPS2024! I'll be presenting our data mixture inference attack 🗓️ Thu 4:30pm w/ @jon.jon.ke — stop by to learn what trained tokenizers reveal about LLM development (‼️) and chat about all things tokenizers.

🔗 arxiv.org/abs/2407.16607

December 11, 2024 at 10:08 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news