We show that attention in LLMs can be accelerated with analog in-memory computing using Gain Cell circuits. Simuating a 1.5B-parameter model we achieve up to 70 000× lower energy and 100× speedup vs GPUs.
We show that attention in LLMs can be accelerated with analog in-memory computing using Gain Cell circuits. Simuating a 1.5B-parameter model we achieve up to 70 000× lower energy and 100× speedup vs GPUs.