ANIKET SHIVAM comments

Results 9 comments of


                                            ANIKET SHIVAM

[FEA] FP8 grouped gemm kernel without TMA

Hi @masahi, you can try to expand the CpAsyncWarpSpecialized Hopper kernels (such as [sm90_gemm_warpspecialized_pingpong.hpp](https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/gemm/kernel/sm90_gemm_warpspecialized_pingpong.hpp) to FP8 Grouped Gemm by using components from the TMA Hopper Grouped Gemm kernel. I believe...

[QST] The performance of Hopper group gemm is not meeting expectation in some cases

These are small-k cases. Current NoSmem epilogues are not optimized for that. We plan to have TMA based epilogue support soon for Grouped GEMM, that should improve these cases.

[BUG][QST] Hopper Grouped GEMM Fails When Workspace not aligned at 64, but MinWorkspaceAlignment =16

@ankutalev, will take a look this week

[BUG][QST] Hopper Grouped GEMM Fails When Workspace not aligned at 64, but MinWorkspaceAlignment =16

Hi @ankutalev, for Ptr-Array and Grouped GEMMs, workspace alignment needs to be 64B (as you see in your experiment), since we use the workspace to keep the tensormaps which need...

[BUG] failing assert in 57_hopper_grouped_gemm example

@thefacetakt did you build cutlass in `Debug` mode using the CMake? I want to confirm which build setting(s) this gets triggered under. Bcoz those debugging checks were only intended to...

[BUG] failing assert in 57_hopper_grouped_gemm example

For now in your use case, you can just build with NDEBUG enabled to ignore it and everything should work out fine.

[BUG] failing assert in 57_hopper_grouped_gemm example

Closing this now.

[EVT] Fix Row/Col broadcast with array arguments

@jwfromm Is the invalid type comparison happening bcoz this line right now is not behind `constexpr`: `is_zero_ = params.ptr_row[0] == ElementInput(0);` For Grouped Gemm case, there is a type mismatch,...

[EVT] Fix Row/Col broadcast with array arguments

yes, please. thanks.