Awesome High-Performance AI Compute
Introduction
High-Performance AI Computing
1.
Parallel Computing
2.
CUDA Programming
2.1.
CUDA Concepts
2.1.1.
Thread Coarsening
2.1.2.
Reduction
2.2.
CUDA Kernels
2.2.1.
Attention
2.2.2.
Encoder
2.2.3.
LayerNorm
2.2.4.
Matrix Multiplication (MatMul)
2.2.5.
Softmax
2.2.6.
Triangular Matrix Multiplication (TriMat)
Light
Rust
Coal
Navy
Ayu
AI Pocket Reference: High-Performance AI Computing
Kernels for positional encoder forward pass in GPT-2
Reading time: 0 min
Contributors: