tahayassine.me
[1] arxiv.org/abs/2110.03742
[2] arxiv.org/abs/2206.04674
[3] arxiv.org/abs/2205.12701
[4] arxiv.org/abs/2405.11157
On the other hand, I think for your use case what you're looking for is training a task-level MoE rather than a token-level one. For example, both [1] and [2] find that a task-MoE is better than a token-MoE for language related tasks.
On the other hand, I think for your use case what you're looking for is training a task-level MoE rather than a token-level one. For example, both [1] and [2] find that a task-MoE is better than a token-MoE for language related tasks.
I'm no expert when it comes to MoEs (no pun intended), but I believe what you're referring to is the specialization of experts under no explicit domain conditioning.
I'm no expert when it comes to MoEs (no pun intended), but I believe what you're referring to is the specialization of experts under no explicit domain conditioning.
- nix shells are great if you use them outside of Python, cf my 1st point. I use them with direnv and really like the dx
- nix shells are great if you use them outside of Python, cf my 1st point. I use them with direnv and really like the dx
- I use an eGPU but a dedicated server is cool too
- you really want to use docker on top of NixOS, it's a disaster with python because packages are not always available/up to date/working; in containers use traditional pip/uv
- I use an eGPU but a dedicated server is cool too
- you really want to use docker on top of NixOS, it's a disaster with python because packages are not always available/up to date/working; in containers use traditional pip/uv
github.com/pypi/support...
github.com/pypi/support...