Hongbo Xu

Results 33 comments of Hongbo Xu

hi @jackkosaian I observed in Nsight that using `streamK` will introduces a `Memset` op, which results in a lot of gaps between GEMM kernels in the `CUDA graph` mode. Could...

I guess it's related to the code [here](https://github.com/NVIDIA/cutlass/blob/833f6990e031b48b4cd2fcf55e0849c51ef6bac2/include/cutlass/gemm/kernel/tile_scheduler_params.h#L1485-L1486) ```c++ //struct PersistentTileSchedulerSm90StreamKParams if (barrier_workspace_size > 0) { if (workspace == nullptr) { return Status::kErrorWorkspaceNull; } // Only the barrier workspace needs...

@ywdblog scale有两种计算方式: - 一种是 `scale=|max|/127`,则 `Q=R/scale` - 另一种是`scale = 127/|max|`,则`Q = R · scale` 不知道是否解答了你的问题