We found that if you simply delete them after pretraining and recalibrate for <1% of the original budget, you unlock massive context windows. Smarter, not harder.
We found embeddings like RoPE aid training but bottleneck long-sequence generalization. Our solution’s simple: treat them as a temporary training scaffold, not a permanent necessity.
arxiv.org/abs/2512.12167
pub.sakana.ai/DroPE
We found that if you simply delete them after pretraining and recalibrate for <1% of the original budget, you unlock massive context windows. Smarter, not harder.
- 1st iteration: Intel created ZLUDA as a drop-in replacement for CUDA on non-NVIDIA GPUs.
- 2nd iteration: AMD took over development after Intel dropped support.
- 1st iteration: Intel created ZLUDA as a drop-in replacement for CUDA on non-NVIDIA GPUs.
- 2nd iteration: AMD took over development after Intel dropped support.