CUDALibrarySamples
CUDALibrarySamples copied to clipboard
cusparseLt questions
Hi, experts. I met a question when I use cusparseLt for model inference. In my case, I set weights as sparse matmul. I hope the initialization ,prune, and compression,Init run only once. Now I have a question,if there are different weights and inputs which have same shape, could I run these code only once?
// matrix descriptor initialization
CHECK_CUSPARSE( cusparseLtStructuredDescriptorInit(
&handle, &matA, num_A_rows,
num_A_cols, lda, alignment,
type_AB, order,
CUSPARSELT_SPARSITY_50_PERCENT) )
CHECK_CUSPARSE( cusparseLtDenseDescriptorInit(
&handle, &matB, num_B_rows,
num_B_cols, ldb, alignment,
type_AB, order) )
CHECK_CUSPARSE( cusparseLtDenseDescriptorInit(
&handle, &matC, num_C_rows,
num_C_cols, ldc, alignment,
type_C, order) )
// matmul, algorithm selection, and plan initialization
CHECK_CUSPARSE( cusparseLtMatmulDescriptorInit(
&handle, &matmul, opA, opB,
&matA, &matB, &matC, &matC,
compute_type) )
CHECK_CUSPARSE( cusparseLtMatmulAlgSelectionInit(
&handle, &alg_sel, &matmul,
CUSPARSELT_MATMUL_ALG_DEFAULT) )
CHECK_CUSPARSE( cusparseLtMatmulPlanInit(&handle, &plan, &matmul, &alg_sel))
CHECK_CUSPARSE(cusparseLtMatmulDescSetAttribute(&handle,
&matmul,
CUSPARSELT_MATMUL_SPARSE_MAT_POINTER,
&dA,
sizeof(dA)));
if (matmul_search) {
CHECK_CUSPARSE( cusparseLtMatmulSearch(&handle, &plan, &alpha,
dA_compressed, dB, &beta,
dC, dD, nullptr,
streams, num_streams) )
// dC accumulates so reset dC for correctness check
CHECK_CUDA( cudaMemcpy(dC, hC, C_size, cudaMemcpyHostToDevice) )
} else {
// otherwise, it is possible to set it directly:
int alg = 0;
CHECK_CUSPARSE( cusparseLtMatmulAlgSetAttribute(
&handle, &alg_sel,
CUSPARSELT_MATMUL_ALG_CONFIG_ID,
&alg, sizeof(alg)))
}
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
size_t workspace_size;
CHECK_CUSPARSE( cusparseLtMatmulPlanInit(&handle, &plan, &matmul, &alg_sel))
CHECK_CUSPARSE( cusparseLtMatmulGetWorkspace(&handle, &plan,
&workspace_size))
void* d_workspace;
CHECK_CUDA( cudaMalloc((void**) &d_workspace, workspace_size) )
In fact, I tried it.But I met the error:
CUSPARSE API failed at line 330 with error : internal error (7)
The line 330 is
CHECK_CUSPARSE( cusparseLtMatmul(&handle, &plan, &alpha, dA_compressed, dB,
&beta, dC, dD, d_workspace, streams,
num_streams) )
@Septend-fun Is it possible to make a reproducer?