Support multiplication of a vector against one lane (broadcasted) of another vector

Open bjacob opened this issue 5 years ago • 0 comments

Suppose you have two vectors u and v, and you want to multiply all elements of the vector u by a single lane of the vector v, e.g. v[0]. This is a very common thing to do, particularly in float matrix multiplication kernels.

Example.

This should be available for all multiplication instructions, including any multiply-add instructions if added to the spec. Float and integer. This will map directly to the corresponding instructions on ARM and will be implemented on x86 by using a broadcast instruction into a temporary vector.

Rationale for this programming model in WebAsm SIMD:

It's more expressive w.r.t. what many applications need to do.
The fallback is efficient provided well ordered instructions in the generated code. By contrast, the current lack of this instruction forces the WebAsm source to use separate broadcast instructions, which make it essentially impossible for the generated code to be efficient.

See ARM benchmarks in this spreadsheet. Row 30, NEON_64bit_GEMM_Float32_WithVectorDuplicatingScalar, is the float kernel that one can write without such instructions. Row 31, NEON_64bit_GEMM_Float32_WithScalar, is the faster float kernel that one can write with such instructions.

May 11 '20 20:05 bjacob