Shantanu Acharya
banner
shantanuacharya.bsky.social
Shantanu Acharya
@shantanuacharya.bsky.social
Researcher at NVIDIA - Working on Long Context LLMs
Reposted by Shantanu Acharya
Star Attention

Star Attention is a new way to make large language models process very long texts much faster while maintaining accuracy.

Author @shantanuacharya.bsky.social is on alphaXiv this week to answer your questions on his paper!
December 2, 2024 at 6:39 PM
🚀 Introducing Star Attention - a novel inference method combining local and global attention to do LLM inference over long sequences.

✅ Improves inference by 11x while preserving 95-100% accuracy
✅Integrates with any LLM without any finetuning

Paper: arxiv.org/abs/2411.17116
Star Attention: Efficient LLM Inference over Long Sequences
Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the self-attention mechanism. We introduce Star Attention, a ...
arxiv.org
November 27, 2024 at 2:09 AM