Nihal Potdar comments

Results 4 comments of


                                            Nihal Potdar

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin

@LucasWilkinson Ran some benchmarks of the kernels on llama 3.1 70B weight shapes and seeing that on low batch sizes, the bandwidth utilization is as low as 20%. Do you...

[BUG] TMA Cooperative GeMM with Stream-K scheduler hangs for specific gemm shapes

@jackkosaian curious how long the separate-reduction fix is expected to take and any suggested workarounds? My understanding is that for small GEMM shapes with large K dimension, separate reduction would...

[BUG] Trying to optimize mixed input for kernels

@azhurkevich yes it is. I was running some matmuls and benchmarking their performance using [this code](https://github.com/NVIDIA/cutlass/blob/main/examples/55_hopper_mixed_dtype_gemm/55_hopper_mixed_dtype_gemm.cu) for uint4 and fp16 datatypes. However, when the K dimension is large and the...

[BUG] Trying to optimize mixed input for kernels

@azhurkevich sure. So, I was working with this [example code](https://github.com/NVIDIA/cutlass/blob/main/examples/55_hopper_mixed_dtype_gemm/55_hopper_mixed_dtype_gemm.cu). If we set the mmaType to float16 (cutlass::half_t) and the quantType to uint4 (cutlass::uint4). For the problem size, M=16, N=2560,...