@leonardoai_. (now at @canva) trying to feel the magic.
www.ethansmith2000.com
For a reasonable N of 2048, these are the computed frequencies prior to cos(x) & sin(x) for fp32 above and bf16 below.
Given how short the period is of simple trig functions, this difference is catastrophic for large values.
For a reasonable N of 2048, these are the computed frequencies prior to cos(x) & sin(x) for fp32 above and bf16 below.
Given how short the period is of simple trig functions, this difference is catastrophic for large values.
github.com/ethansmith20...
github.com/ethansmith20...
If ML experimentation software was as widespreadly accessible as it is now, I'd be generative diffusion models would have been discovered a while ago.
If ML experimentation software was as widespreadly accessible as it is now, I'd be generative diffusion models would have been discovered a while ago.
dev-discuss.pytorch.org/t/fsdp-cudac...
dev-discuss.pytorch.org/t/fsdp-cudac...