ANIKET SHIVAM
ANIKET SHIVAM
Hi @masahi, you can try to expand the CpAsyncWarpSpecialized Hopper kernels (such as [sm90_gemm_warpspecialized_pingpong.hpp](https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_pingpong.hpp) to FP8 Grouped Gemm by using components from the TMA Hopper Grouped Gemm kernel. I believe...
These are small-k cases. Current NoSmem epilogues are not optimized for that. We plan to have TMA based epilogue support soon for Grouped GEMM, that should improve these cases.
[BUG][QST] Hopper Grouped GEMM Fails When Workspace not aligned at 64, but MinWorkspaceAlignment =16
@ankutalev, will take a look this week
[BUG][QST] Hopper Grouped GEMM Fails When Workspace not aligned at 64, but MinWorkspaceAlignment =16
Hi @ankutalev, for Ptr-Array and Grouped GEMMs, workspace alignment needs to be 64B (as you see in your experiment), since we use the workspace to keep the tensormaps which need...
@thefacetakt did you build cutlass in `Debug` mode using the CMake? I want to confirm which build setting(s) this gets triggered under. Bcoz those debugging checks were only intended to...
For now in your use case, you can just build with NDEBUG enabled to ignore it and everything should work out fine.
Closing this now.
@jwfromm Is the invalid type comparison happening bcoz this line right now is not behind `constexpr`: `is_zero_ = params.ptr_row[0] == ElementInput(0);` For Grouped Gemm case, there is a type mismatch,...
yes, please. thanks.