CUDA-Learn-Notes icon indicating copy to clipboard operation
CUDA-Learn-Notes copied to clipboard

Kernel Trace issue

Open DefTruth opened this issue 1 year ago • 0 comments

TODO

  • [ ] swish kernel
  • [ ] gelu kernel
  • [ ] RoPE kernel
  • [x] pack elementwise_add
  • [x] pack sigmoid
  • [x] pack relu
  • [x] histogram
  • [x] warp/block reduce
  • [x] softmax
  • [x] pack safe_softmax
  • [x] pack layer-norm
  • [x] pack rms-norm
  • [x] flash-attn-1 f32
  • [ ] flash-attn-2 f32
  • [ ] flash-attn-2 f16
  • [x] MMA(Tensor Cores) flash-attn-2 f16
  • [x] warp segmv
  • [x] warp hgemv
  • [x] bank confilcts reduce sgemm
  • [x] pipeling sgemm
  • [ ] split_k sgemm
  • [x] pack LDST hgemm
  • [x] bank confilcts reduce hgemm
  • [x] pipeling hgemm
  • [ ] split_k hgemm
  • [x] cp.async hgemm
  • [x] cp.async sgemm
  • [ ] stage3+cp.async/cp.async.reduce.bulk hgemm
  • [ ] WMMA API(Tensor Cores) hgemm
  • [ ] MMA PTX(Tensor Cores) hgemm
  • [ ] pack online_safe_softmax
  • [ ] cp.async.reduce.bulk block_all_reduce
  • [ ] ...

DefTruth avatar Sep 26 '24 08:09 DefTruth