Taha Yassine
tahayassine.me
Taha Yassine
@tahayassine.me
Independent researcher working on NLP/LLMs · PhD in AI & Wireless Comms

tahayassine.me

On the other hand, I think for your use case what you're looking for is training a task-level MoE rather than a token-level one. For example, both [1] and [2] find that a task-MoE is better than a token-MoE for language related tasks.
December 16, 2024 at 9:33 PM
These madlads also made a tool that allows you to create a colormap and shows you advanced metrics to help you
December 3, 2024 at 10:07 PM
The developer changing his mind while writing the docs and letting us know
December 2, 2024 at 7:54 PM
TIL you can do print(f"{big_number:,}") to display a comma separated number. It's so much easier to read this way.
November 26, 2024 at 12:30 AM
My curse is wanting to spend a week trying to optimize my GPU utilization when the job could finish by the morning if I let it run
November 24, 2024 at 4:01 AM
November 16, 2024 at 11:27 PM
LLMs in the terminal are one killer use case, but I'm wary of how it atrophies my CLI warrior skills as I find myself abusing it for even the most basic commands
November 12, 2024 at 6:31 PM