Lightnews — Scholar-powered news

Taha Yassine

@tahayassine.me

[1] arxiv.org/abs/2110.03742
[2] arxiv.org/abs/2206.04674
[3] arxiv.org/abs/2205.12701
[4] arxiv.org/abs/2405.11157

December 16, 2024 at 9:33 PM

Taha Yassine

@tahayassine.me

[3] is perhaps the most thorough work I could find exploring this setup for learning multiple tasks. They also investigate soft-routing. [4] seems interesting too, they train LoRAs on the same base for different tasks and train the router to select the correct LoRA to use for a given input.

December 16, 2024 at 9:33 PM

Taha Yassine

@tahayassine.me

On the other hand, I think for your use case what you're looking for is training a task-level MoE rather than a token-level one. For example, both [1] and [2] find that a task-MoE is better than a token-MoE for language related tasks.

December 16, 2024 at 9:33 PM

Taha Yassine

@tahayassine.me

In the case of Mixtral they don't mention any special auxiliary loss to incentivize the router to push experts to specialize. In general, an auxiliary term may be added to encourage an even assignment of tokens across experts for better load balancing.

December 16, 2024 at 9:33 PM

Taha Yassine

@tahayassine.me

Sorry I'm only responding now.
I'm no expert when it comes to MoEs (no pun intended), but I believe what you're referring to is the specialization of experts under no explicit domain conditioning.

December 16, 2024 at 9:33 PM

Taha Yassine

@tahayassine.me

Maybe you could train an MoE? Your aux model would be the router and part of the main model, and you'd train it with a corresponding loss term to route to the correct expert at training time. This obviously means you'd have as many experts as you have modes in your data dist if you do hard routing.

December 14, 2024 at 9:54 AM

Taha Yassine

@tahayassine.me

github.com/matplotlib/v...

GitHub - matplotlib/viscm: A tool for visualizing and designing colormaps using colorspacious and matplotlib

A tool for visualizing and designing colormaps using colorspacious and matplotlib - matplotlib/viscm

github.com

December 3, 2024 at 10:07 PM

Taha Yassine

@tahayassine.me

These madlads also made a tool that allows you to create a colormap and shows you advanced metrics to help you

December 3, 2024 at 10:07 PM

Taha Yassine

@tahayassine.me

"network graph" seems to work as a workaround

December 1, 2024 at 1:28 PM

Taha Yassine

@tahayassine.me

Wow, TIL. Now it's gonna sound weird when I use in french.

December 1, 2024 at 1:17 PM

Taha Yassine

@tahayassine.me

Lofi for reading papers and synthwave for coding

November 30, 2024 at 4:38 AM

Taha Yassine

@tahayassine.me

Nice to know, will give it a try

November 29, 2024 at 3:36 PM

Taha Yassine

@tahayassine.me

Have you considered using an eGPU?

November 29, 2024 at 3:21 PM

Taha Yassine

@tahayassine.me

Any reason you went this route rather than using something like Ansible?

November 27, 2024 at 8:13 AM

Taha Yassine

@tahayassine.me

- vscode works really well with the remote extension, so no need to use the browser client imo
- nix shells are great if you use them outside of Python, cf my 1st point. I use them with direnv and really like the dx

November 25, 2024 at 8:36 PM

Taha Yassine

@tahayassine.me

This is almost my current setup but here are a few points:
- I use an eGPU but a dedicated server is cool too
- you really want to use docker on top of NixOS, it's a disaster with python because packages are not always available/up to date/working; in containers use traditional pip/uv

November 25, 2024 at 8:36 PM