Donglin Zhuang issues

Results 4 issues of


                                            Donglin Zhuang

Incorrect GEMM result when K is not multiple of 32

I tried several triton-generated GEMM kernels with different shapes using implementation adopted from [Triton GEMM Tutorial](https://triton-lang.org/master/getting-started/tutorials/03-matrix-multiplication.html) I noticed that the tutorial load from A and B matrix is unguarded by...

help wanted

IndexError: map::at when calculate offset in tl.arange type

I am trying to add a little bit of enhancement to GEMM implementation from [Triton GEMM Tutorial](https://triton-lang.org/master/getting-started/tutorials/03-matrix-multiplication.html) by supporting K that is not multiple of 32. To do so, I...

bug

Cannot compile Stable Diffusion

Hi, I am trying the stable diffusion in the example https://github.com/facebookincubator/AITemplate/tree/main/examples/05_stable_diffusion But get the following error when compiling the model with python3 examples/05_stable_diffusion/compile.py --token ACCESS_TOKEN ``` File "examples/05_stable_diffusion/compile.py", line 379,...

[BUG] FP8 warp specialized gemm failed when m is small

**Describe the bug** ./54_hopper_fp8_warp_specialized_gemm cannot run with small m (e.g. 1/2/5). **Steps/Code to reproduce bug** ``` 1. build 54_hopper_fp8_warp_specialized_gemm 2. run with ./54_hopper_fp8_warp_specialized_gemm --m=2 --n=2048 --k=2048 ``` Output: ``` Got...

bug

? - Needs Triage