Lightnews — Scholar-powered news

Tian Jin

@tjin.bsky.social

Consider two parameter count vs. training step curves, w/ an equivalent area under the curve, ie., training FLOPs. Solid line = dense pre-training, dashed line = sparse pre-training w/ gradual pruning. While they differ in final param count, they match in average param count. 3/N

April 21, 2025 at 7:15 AM

Tian Jin

@tjin.bsky.social

The quality-speedup trade-off keeps improving with more training - showing no signs of saturation! We took 4 snapshots at different points of preference optimization (10% Round 1, 100% R1, 10% R2, 60% R2). As we train more, this trade-off improves toward the optimal top-right corner. 11/N

February 27, 2025 at 12:38 AM

Tian Jin

@tjin.bsky.social

We show that PASTA Pareto-dominates all existing async decoding methods! We achieve geometric mean speedups ranging from 1.21× to 1.93× with corresponding quality changes of +2.2% to -7.1%, measured by length-controlled win rates against sequential decoding baseline. 10/N

February 27, 2025 at 12:38 AM

Tian Jin

@tjin.bsky.social

Excited to share our work with friends from MIT/Google on Learned Asynchronous Decoding! LLM responses often contain chunks of tokens that are semantically independent. What if we can train LLMs to identify such chunks and decode them in parallel, thereby speeding up inference? 1/N

February 27, 2025 at 12:38 AM

Tian Jin

@tjin.bsky.social

The quality-speedup trade-off keeps improving with more training - showing no signs of saturation! We took 4 snapshots at different points of preference optimization (10% Round 1, 100% R1, 10% R2, 60% R2). As we train more, this trade-off steadily improves toward the optimal top-right corner. 10/N

February 26, 2025 at 11:42 PM

Tian Jin

@tjin.bsky.social

We show that PASTA Pareto-dominates all existing async decoding methods! We achieve geometric mean speedups ranging from 1.21× to 1.93× with corresponding quality changes of +2.2% to -7.1%, measured by length-controlled win rates against sequential decoding baseline. 9/N

February 26, 2025 at 11:42 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news