https://open.substack.com/pub/michalpitr
I spend more time on building the intuition for some of the optimizations.
lnkd.in/dRPZmZyM
I spend more time on building the intuition for some of the optimizations.
lnkd.in/dRPZmZyM
* How to optimize GEMM
Nice step-by-step introduction by BLIS contributors.
lnkd.in/df6FdX8S
* BLISlab
Goes into much more depth than most introductory sources. Source of the illustration below.
lnkd.in/dHG6akFN
* How to optimize GEMM
Nice step-by-step introduction by BLIS contributors.
lnkd.in/df6FdX8S
* BLISlab
Goes into much more depth than most introductory sources. Source of the illustration below.
lnkd.in/dHG6akFN
Packing arranges sub-matrix data contiguously. This helps performance and can reduce cache conflicts with large matrices. I cover this more in my article.
Packing arranges sub-matrix data contiguously. This helps performance and can reduce cache conflicts with large matrices. I cover this more in my article.
A fast micro-kernel doesn't help much if it needs to wait for data to be loaded from RAM. Tiling breaks down large matrices into smaller, cache-friendly blocks. The goal is to maximize data reuse from caches.
A fast micro-kernel doesn't help much if it needs to wait for data to be loaded from RAM. Tiling breaks down large matrices into smaller, cache-friendly blocks. The goal is to maximize data reuse from caches.
BLAS libs often use small kernels (e.g., 8x4) optimized for specific CPUs using SIMD intrinsics and software prefetching. Many use handwritten assembly for optimal register allocation.
BLAS libs often use small kernels (e.g., 8x4) optimized for specific CPUs using SIMD intrinsics and software prefetching. Many use handwritten assembly for optimal register allocation.
It's always fun to see how people approach explaining technical topics. I really like Kelsey's concise style, but you might have to Google around for the initial infra setup.
It's always fun to see how people approach explaining technical topics. I really like Kelsey's concise style, but you might have to Google around for the initial infra setup.
💡Linux networking is probably the toughest part, but it can be pretty rewarding to debug and understand.
💡Linux networking is probably the toughest part, but it can be pretty rewarding to debug and understand.
open.substack.com/pub/michalpitr
#promosky #promotionsky
open.substack.com/pub/michalpitr
#promosky #promotionsky