Lightnews — Scholar-powered news

Horace He

@chhillee.bsky.social

790 followers 60 following 13 posts

@PyTorch "My learning style is Horace twitter threads" -
@typedfemale

Posts Replies Media Videos

Horace He

@chhillee.bsky.social

Yep that's right! A very common use-case is for "document masking" (i.e. variable length sequences), and that requires recomputing the mask on every iteration (which isn't "free", but is on the order of microseconds to milliseconds and not seconds).

February 6, 2025 at 10:22 PM

Horace He

@chhillee.bsky.social

What does "there" mean in this case :)

January 2, 2025 at 1:13 PM

Horace He

@chhillee.bsky.social

I’ll count it!

December 3, 2024 at 8:23 AM

Reposted by Horace He

Mike Smith

@mjjsmith.com

Getting different attention masks working for AstroPT (a proto-foundation model for astronomy github.com/Smith42/astr...), so much nicer to do it with Flex Attention vs custom CUDA kernels -- thank you for releasing it to the world 🫡

GitHub - Smith42/astroPT: Transformer for galaxy images (and general astronomy)

Transformer for galaxy images (and general astronomy) - Smith42/astroPT

github.com

December 2, 2024 at 9:30 AM

Horace He

@chhillee.bsky.social

Kinda interesting to me that the books I obsessively read as an elementary schooler are still some of the most popular series today.

December 1, 2024 at 11:51 PM

Horace He

@chhillee.bsky.social

I think torch-xla is definitely usable if you don’t want to train anything particularly weird or use unusual parallelism schemes. See this tweet from Saining Xie’s lab on evaluating torchxla vs. Jax for their use case: x.com/tongpetersb/...

x.com

December 1, 2024 at 8:35 AM

Horace He

@chhillee.bsky.social

The other nice parts about TPUs is that Google gives much more of them out for free compared to GPUs. Arguably this reflects how much people want to use them, but I think it's been a great boon for the academic labs willing to go through the effort.

December 1, 2024 at 2:24 AM

Horace He

@chhillee.bsky.social

! What were you using it for?

December 1, 2024 at 1:49 AM

Horace He

@chhillee.bsky.social

A lot of PyTorch is about dealing with this stuff nowadays!

December 1, 2024 at 1:47 AM

Horace He

@chhillee.bsky.social

Out of curiosity, what kind of shapes are you typically looking at?

December 1, 2024 at 1:46 AM

Horace He

@chhillee.bsky.social

Are they actually using FlexAttention here? I didn't see it in the repo

December 1, 2024 at 1:44 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news