ReLAPACK icon indicating copy to clipboard operation
ReLAPACK copied to clipboard

dgemmt but not dgemm?

Open mohawk2 opened this issue 3 years ago • 3 comments

I had a very quick look in src and was surprised that there's a dgemmt but no dgemm. There's no mention of reasoning in coverage.md - is this simply something that's not been done yet?

mohawk2 avatar Feb 20 '22 16:02 mohawk2

dgemm is part of BLAS, not LAPACK. ReLAPACK must be linked with a BLAS implementation, which will provide dgemm.

dgemmt is not part of BLAS but needed by a LAPACK algorithm, so ReLAPACK provides a recursive implementation for it.

elmar-peise avatar Feb 20 '22 20:02 elmar-peise

Thank you for your rapid reply! My assumption was that the performance gains from the recursive algorithm over the tuned block algorithms would be equally large in dgemm as in dgemmt. I appreciate that other BLASes provide dgemm, but feeling "greedy" I wondered if ReLAPACK could provide a high-performance dgemm as well? (A superficial reading of the source for your dgemmt made it look as though it wouldn't be a lot of work to make a dgemm as well)

mohawk2 avatar Feb 20 '22 20:02 mohawk2

A ReLAPACK-style recursive dgemm implementation would outperform the reference BLAS, but almost certainly not reach the performance of high-performance BLAS implementations. Such libraries are tuned for specific CPU architectures and cache sizes, and often contain hand-written assembly. Overall ReLAPACK is built on the assumption that it's linked to an optimized BLAS, which typically performs best for large matrices: Recursion calls BLAS with large sub-problems, which can give better performance than a blocked algorithm's calls with much smaller, fixed panel sizes.

(Yes, it should be simple to implement a recursive dgemm to test this.)

elmar-peise avatar Feb 22 '22 06:02 elmar-peise