Akira Naruse
Akira Naruse
Thank you for re-running it. I've checked the config and the results. It looks good to me!
> @anaruse Have you ever heard any cases like [this](https://github.com/cupy/cupy/pull/7052#issuecomment-1275512137) about the performance of cusparseSpSM? (I'm afraid that I don't immediately have naive C/C++ replicator) It is quite slow.. Please...
Just checking still, I took a profile with profiler.time_range (nvtx) and confirmed that, indeed, the cusparseSpSM_analysis() time is very long.. 
I built CuPy (v11) with the following combinations and tested `cupy_tests/core_tests/test_userkernel.py` and `cupy_tests/cuda_tests/test_texture.py`. The error occurred only with the combination of CUDA 11.5.0 and CUDA-python 11.7.1. For CUDA 11.5, is...
The CUDA python team's investigation has revealed that this is not directly due to CUDA python, but is essentially due to backward compatibility issues with the texture related ABIs in...
Your question is whether we can choose to use rank based optimization or distance based optimization when optimizing the initial KNN graph to create a search graph. The answer is...
You can get a better matrix multiply performance with TensorCore, but fp16 complex numbers are not supported by CUDA libraries except [cublasLtMatmul()](https://docs.nvidia.com/cuda/cublas/index.html#cublasLtMatmul) in cuBLASTLt. Even with cublasLtMatmul(), all matrices must...
On Kepler generation GPUs, adding `__restrict__` improves performance, but modern GPUs, it rarely improve performance..
Hmm, it might have something to do with the issue below. https://github.com/cupy/cupy/issues/3935
Thank you for reporting the problem. Regarding the issue of recall decreasing as the number of dimensions (n_cols) increases, CAGRA has a search parameter called `itopk_size`, could you try increasing...