Jianfeng Yan comments

Results 17 comments of


                                            Jianfeng Yan

cusparseLtMatmul example is much slower than cublasGemmEx

@SimonSongg Could you double check the data types and layouts are the same in cuSPARSELt and cuBLAS?

cusparseLtMatmul example is much slower than cublasGemmEx

Many kernels are launched by cusparseLtMatmulSearch(), by setting matmul_search=false this routine is disabled. For small problem sizes like 320 x 320 x 640 you probably observe much speedup against dense...

cusparseLtMatmul example is much slower than cublasGemmEx

@SimonSongg cusparseLtMatmulSearch() is the auto-tuning API. Sorry I mean for very small sizes you **won't** observe much speedup.

cusparseLt questions

@Septend-fun Is it possible to make a reproducer?

cuSPARSELt FP4 Issue on RTX 5090

Hi @Zor-X-L In order to use fp4 you have to specify the scale modes of A/B/output matrices and the corresponding scale pointers; see https://docs.nvidia.com/cuda/cusparselt/types.html#cusparseltmatmuldescattribute-t. Could you give it a try?...

cuSPARSELt FP4 Issue on RTX 5090

Hi @Zor-X-L 1. Could you try half of CompressedSize() for compressed_size? This's actually a bug for fp4 and will be fixed in next release. 2. Yes it's the right way...

cuSPARSELt FP4 Issue on RTX 5090

@Zor-X-L 1. You are right. Half of the CompressedSize() is not correct because it also halves the amount of metadata. 2. Yes please try batched. I can't think of any...

some issues during the use of CUSAPRSELT

@zhoeujei 1. In [the documentation](https://docs.nvidia.com/cuda/cusparselt/#cusparselt-a-high-performance-cuda-library-for-sparse-matrix-matrix-multiplication), "NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which **at least** one operand is a sparse matrix" 2. Currently only...

some issues during the use of CUSAPRSELT

@zhoeujei Just to followup. Does the above rely resolve your issue? If yes, let's close.

some issues during the use of CUSAPRSELT

@JanuszL I think we can close.