Bill Budge

Results 10 comments of Bill Budge

ARM v7/v8 have: Vector Count Leading Sign Bits VCLS Vector Count Leading Zeros VCLZ Vector Count Set Bits VCNT

I'm in favor of this. We've implemented it in the V8 prototype.

These are pairwise additions, so for a 4 lane vector type, two source operands would form a single destination vector like this: [ src0[0] + src0[1], src0[2] + src0[3], src1[0]...

These can be composed to do the full reductions. The advantage of keeping them primitive (pairwise) is that a compiler will have more opportunity to schedule the instructions. If we...

Starting with a vector: [x0, x1, x2, x3] pairwise reduction with itself: [x0 + x1, x2 + x3, x0 + x1, x2 + x3] another pairwise reduction with itself: [x0...

OK, I see what you're saying. You're correct that floating point operations are not associative. The general intent of SIMD is to give performance improvements for vector operations. If you...

The V8 value-type tests for float32x4 link posted above changed: https://codereview.chromium.org/1219943002/diff/230001/test/mjsunit/harmony/simd.js

Intel at least appears to have these: https://software.intel.com/en-us/node/583113

A concern I have is that f64x2 is not supported well on platforms other than Intel. Implementations would be forced to lower these opcodes to the equivalent scalar ops, probably...

Hi JF. Yes, f64x2 in general. We think it will be hard enough to get clear performance wins with f32x4, so somewhat skeptical about f64x2, especially on platforms like ARM.