cuBERT
cuBERT copied to clipboard
further accelerate CPU gemm by mkl compact blas
As MKL blas compact routines are faster, we might need to support compact gemm in the future. https://software.intel.com/en-us/mkl-developer-reference-c-mkl-gemm-compact