Lightnews — Scholar-powered news

pranav

@pranav.bsky.social

1.7K followers 250 following 120 posts

Research Scientist at Google DeepMind. ಕನ್ನಡಿಗ.
Past: Researchoor, Algorithms team at OpenAI & with Juergen Schmidhuber.

Posts Replies Media Videos

pranav

@pranav.bsky.social

what’s the mfu like

December 11, 2024 at 1:50 AM

pranav

@pranav.bsky.social

Personally I’m even more primitive and know basic calculus only. So the significance of this is totally lost on me. But at the same time I don’t want to do a depth first search and take 5 years to grok all this either

December 9, 2024 at 12:16 AM

pranav

@pranav.bsky.social

Does exploration

December 4, 2024 at 8:10 AM

pranav

@pranav.bsky.social

Falsifiable prediction = respect

December 3, 2024 at 8:23 PM

pranav

@pranav.bsky.social

Similar to how “Threads should not be a library”

December 3, 2024 at 8:20 PM

pranav

@pranav.bsky.social

That’s not even the first one. Just the first good one that didn’t use Hidden Markov Models

December 3, 2024 at 1:50 PM

pranav

@pranav.bsky.social

scholar.google.co.uk/citations?vi...

‪Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks‬

‪A Graves, S Fernández, F Gomez, J Schmidhuber‬, ‪Proceedings of the 23rd international conference on Machine learning, 2006‬ - ‪Cited by 7,222‬

scholar.google.co.uk

December 3, 2024 at 1:49 PM

pranav

@pranav.bsky.social

Ah that explains your knowledge of dosas finally

November 29, 2024 at 1:13 PM

pranav

@pranav.bsky.social

Good water supply

November 29, 2024 at 3:34 AM

pranav

@pranav.bsky.social

hmm what a coincidence this suddenly popped up on the other site

November 26, 2024 at 8:10 PM

pranav

@pranav.bsky.social

There are papers pipelining along the token dimension.
Agree it’s a little too good to be true, too basic to be new

November 26, 2024 at 7:07 AM

pranav

@pranav.bsky.social

I read it twice and still don’t understand what the insight is. Might have to read the paper

November 26, 2024 at 6:59 AM

pranav

@pranav.bsky.social

I now hit cmd + s every breath due to trauma from this

November 26, 2024 at 2:32 AM

pranav

@pranav.bsky.social

delete this

November 25, 2024 at 2:21 PM

pranav

@pranav.bsky.social

There’s also BPE dropout

November 25, 2024 at 4:20 AM

pranav

@pranav.bsky.social

btw training a 5e25 flops model at 50% MFU would take 10k H100s for 100 days. anything more than that is surplus territory.

in any case pretty impressive operation!

November 25, 2024 at 4:15 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news