Performance of mixed-type (matrix-) multiplication

Open trahflow opened this issue 3 years ago • 1 comments

Hi,

I noticed that mixed-type multiplication (e.g. *(::SMatrix, ::Matrix)) is slow. I searched the issues but didn't find one that mentions this. For example:

julia> SA = @SMatrix randn(3,3);

julia> A = Matrix(SA);

julia> B = randn(3, 1_000);

julia> @btime $SA * $B;
  14.455 μs (5 allocations: 54.23 KiB)

julia> @btime $A * $B;
  2.121 μs (2 allocations: 23.48 KiB)

This is, because *(::SMatrix, ::Matrix) doesn't hit BLAS and goes via LinearAlgebra.generic_matmul!(). This case doesn't seem to be covered by the benchmarks (unless I missed it), so I'm wondering whether it would it be in scope (of this package) to address this. If not, I think it should be mentioned somewhere though. Note that converting to standard Array followed by Matrix-Matrix multiplication is much faster than the SMatrix-Matrix multiplication (because it hits BLAS), and allocates less:

julia> @btime Matrix($SA) * $B;
  2.062 μs (3 allocations: 23.61 KiB)

Mar 24 '23 11:03 trahflow

Yes, I think dispatching mixed-type multiplication to BLAS would be a good idea and fits the scope of StaticArrays.jl. I can review a PR that adds that. Mixed calls could even return HybridArray (from HybridArrays.jl) but that would be out of scope for StaticArrays.jl I think.

Mar 24 '23 11:03 mateuszbaran