Did it really take two years for someone to point out that RoPE will (obviously?) not decay activation with relative distance for random key value pairs?
From: arxiv.org/pdf/2410.06205
Did it really take two years for someone to point out that RoPE will (obviously?) not decay activation with relative distance for random key value pairs?
From: arxiv.org/pdf/2410.06205