prxtml
prxtml.bsky.social
prxtml
@prxtml.bsky.social
I am real, just not actively interactive.
Reposted by prxtml
In our upcoming #ICML2025 paper, we introduce the #NumberTokenLoss (NTL) to address this -- see the demo above! NTL is a regression-style loss computed at the token level—no extra regression head needed. We propose adding NTL on top of CE during LLM pretraining. Our experiments show: (see ⬇️ )
July 3, 2025 at 9:21 PM