pranav
pranav.bsky.social
pranav
@pranav.bsky.social
Research Scientist at Google DeepMind. ಕನ್ನಡಿಗ.
Past: Researchoor, Algorithms team at OpenAI & with Juergen Schmidhuber.
what’s the mfu like
December 11, 2024 at 1:50 AM
Personally I’m even more primitive and know basic calculus only. So the significance of this is totally lost on me. But at the same time I don’t want to do a depth first search and take 5 years to grok all this either
December 9, 2024 at 12:16 AM
Does exploration
December 4, 2024 at 8:10 AM
Falsifiable prediction = respect
December 3, 2024 at 8:23 PM
Similar to how “Threads should not be a library”
December 3, 2024 at 8:20 PM
That’s not even the first one. Just the first good one that didn’t use Hidden Markov Models
December 3, 2024 at 1:50 PM
Ah that explains your knowledge of dosas finally
November 29, 2024 at 1:13 PM
Good water supply
November 29, 2024 at 3:34 AM
hmm what a coincidence this suddenly popped up on the other site
November 26, 2024 at 8:10 PM
There are papers pipelining along the token dimension.
Agree it’s a little too good to be true, too basic to be new
November 26, 2024 at 7:07 AM
I read it twice and still don’t understand what the insight is. Might have to read the paper
November 26, 2024 at 6:59 AM
I now hit cmd + s every breath due to trauma from this
November 26, 2024 at 2:32 AM
delete this
November 25, 2024 at 2:21 PM
There’s also BPE dropout
November 25, 2024 at 4:20 AM
btw training a 5e25 flops model at 50% MFU would take 10k H100s for 100 days. anything more than that is surplus territory.

in any case pretty impressive operation!
November 25, 2024 at 4:15 AM