Bob Jin

Results 1 issues of Bob Jin

I am currently testing [examples/cute/tutorials/hopper/wgmma_sm90.cu](https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/hopper/wgmma_sm90.cu) on H800. The throughput is a quite low, ~240TFLOPs, compared to the [paper](https://arxiv.org/abs/2402.13499) in the same case, ~970TFLOPS. So, I used Nsight Compute to profile...

question