davidgrangier.bsky.social
@davidgrangier.bsky.social
3/3

Mixture of experts on high latency networks with No Need to Talk iclr.cc/virtual/2025... (Thu Apr 24 3pm).

Joint work with @matpagliardini.bsky.social , Anastasiia Filippova, @pierreablin.bsky.social, Simin Fan, Skyler Seto, Angelos Katharopoulos, Ronan Collobert
ICLR Poster No Need to Talk: Asynchronous Mixture of Language ModelsICLR 2025
iclr.cc
April 21, 2025 at 11:55 PM
2/3

Importance sampling for better pretraining distribution with CRISP iclr.cc/virtual/2025... (Sat Apr 26, 10 am).
ICLR Poster Task-Adaptive Pretrained Language Models via Clustered-Importance SamplingICLR 2025
iclr.cc
April 21, 2025 at 11:55 PM