Bob Jin
Results
1
issues of
Bob Jin
I am currently testing [examples/cute/tutorials/hopper/wgmma_sm90.cu](https://github.com/NVIDIA/cutlass/blob/main/examples/cute/tutorial/hopper/wgmma_sm90.cu) on H800. The throughput is a quite low, ~240TFLOPs, compared to the [paper](https://arxiv.org/abs/2402.13499) in the same case, ~970TFLOPS. So, I used Nsight Compute to profile...
question