CUDALibrarySamples icon indicating copy to clipboard operation
CUDALibrarySamples copied to clipboard

cuBLAS single precision issue

Open ww5862 opened this issue 1 year ago • 2 comments

hello

I'm using cublasSgemm for compute single precision GEMM which dimension is 1024x1024x1024. If I compare cublasSgemm and CUTLASS single precision GEMM kernel, the validation is not correct. However compare result with CUTLASS single precision GEMM kernel and CPU code for validation is true. My evaluation setting is RTX3090, and nvcc version is 12.4. CuBLAS can not compute correct result if using mordern nvcc with RTX3090?

ww5862 avatar Apr 16 '24 06:04 ww5862

Hello @ww5862.

Thanks for the report. A few questions:

  1. Can you please post output of test run after setting the environment variable CUBLASLT_LOG_MASK=64 (e.g. export CUBLASLT_LOG_MASK=64 in bash)? (documentation).
  2. How do you compare the results? By default, cuBLAS uses several optimizations that change the order of operations which affect the result, but it should remain close in most cases.
  3. Do you still see the difference if you call cublasSetMathMode(handle, CUBLAS_PEDANTIC_MATH)? ( documentation)

rsdubtso avatar Apr 16 '24 06:04 rsdubtso

Thank you, I will do it!

ww5862 avatar Apr 16 '24 10:04 ww5862