libflame
libflame copied to clipboard
Support CONJUGATE_NO_TRANSPOSE for general matrices.
Details:
- Follow the CPU logic to support CONJUGATE_NO_TRANSPOSE where an intermediate buffer is allocated, the matrix is copied into it, and the imaginary part of the complex number is conjugated.
- Integrate this into Gemv/Gemm/Apply_Q/Bidiag_apply[U,V].
- While there, follow the optimized strided and batched logic also for the FLA_Scal HIP operator.
- Optimize the regular case for the FLA_Copy HIP operator.
- Fix an inverted <.