Donglin Zhuang
Donglin Zhuang
I tried several triton-generated GEMM kernels with different shapes using implementation adopted from [Triton GEMM Tutorial](https://triton-lang.org/master/getting-started/tutorials/03-matrix-multiplication.html) I noticed that the tutorial load from A and B matrix is unguarded by...
I am trying to add a little bit of enhancement to GEMM implementation from [Triton GEMM Tutorial](https://triton-lang.org/master/getting-started/tutorials/03-matrix-multiplication.html) by supporting K that is not multiple of 32. To do so, I...
Hi, I am trying the stable diffusion in the example https://github.com/facebookincubator/AITemplate/tree/main/examples/05_stable_diffusion But get the following error when compiling the model with python3 examples/05_stable_diffusion/compile.py --token ACCESS_TOKEN ``` File "examples/05_stable_diffusion/compile.py", line 379,...
**Describe the bug** ./54_hopper_fp8_warp_specialized_gemm cannot run with small m (e.g. 1/2/5). **Steps/Code to reproduce bug** ``` 1. build 54_hopper_fp8_warp_specialized_gemm 2. run with ./54_hopper_fp8_warp_specialized_gemm --m=2 --n=2048 --k=2048 ``` Output: ``` Got...