Awesome High-Performance AI Compute
Introduction
High-Performance AI Computing
1.
Parallel Computing
2.
CUDA Programming
2.1.
CUDA Concepts
2.1.1.
Thread Coarsening
2.1.2.
Reduction
2.2.
CUDA Kernels
2.2.1.
Attention
2.2.2.
Encoder
2.2.3.
LayerNorm
2.2.4.
Matrix Multiplication (MatMul)
2.2.5.
Softmax
2.2.6.
Triangular Matrix Multiplication (TriMat)