1. Awesome High-Performance AI Compute
  2. Introduction
  3. High-Performance AI Computing
  4. 1. Parallel Computing
  5. 2. CUDA Programming
    1. 2.1. CUDA Concepts
      1. 2.1.1. Thread Coarsening
      2. 2.1.2. Reduction
    2. 2.2. CUDA Kernels
      1. 2.2.1. Attention
      2. 2.2.2. Encoder
      3. 2.2.3. LayerNorm
      4. 2.2.4. Matrix Multiplication (MatMul)
      5. 2.2.5. Softmax
      6. 2.2.6. Triangular Matrix Multiplication (TriMat)

AI Pocket Reference: High-Performance AI Computing

Kernels for positional encoder forward pass in GPT-2

Suggest an Edit

Reading time: 0 min


Contributors:
Contributor VectorInstitute