John Platts

Results 67 comments of John Platts

A more optimal implementation of the set-before-first operation for masks is possible on SSE4/AVX2/AVX3/RVV/PPC10. Here is how the SetBeforeFirst operation could be implemented for masks for 128-bit or smaller vectors...

> @johnplatts sounds like you are proposing a new SetBeforeFirst op, unrelated to this particular MulInt code? That sounds potentially useful, would welcome a pull request with your code. But...

The SSE4.2 PCMPISTRI, PCMPISTRM, PCMPESTRI, and PCMPESTRM can do the following operations using a single instruction: - Equal any (equivalent to the following): ``` template MFromD EqualAny(V a, V b,...

`Lanes(double_vector_tag)` returns the actual number of lanes in `Vec`, and the result of `Lanes(double_vector_tag)` can differ from `MaxLanes(double_vector_tag)` on targets that use scalable vectors such as SVE or RVV. Here...

Here is an example of Highway dynamic dispatch code updated to support multi-phase compilation (compiled more than once with different compiler options for the different compilation phases): ``` // Generates...

AVX3_DL is also capable of carrying out the saturated doubling multiply add using the _mm*_dpwssds_epi32 intrinsics. Here is how the vqdmla op can be implemented on AVX3_DL for I32 vectors...

It is possible to bitcast SVE vectors to NEON vectors and vice versa on GCC and Clang releases that have support for the arm_neon_sve_bridge.h header, including Clang 14 and later...

Here is a link to a Compiler Explorer snippet that demonstrates the use of the ARM NEON SVE Bridge intrinsics (which are defined in the arm_neon_sve_bridge.h header) to convert between...

There were bugs in RVV F64->F32 and F32->F16 DemoteTo, which are fixed in pull request #2164. RVV Ceil and Floor have also been reimplemented in pull request #2164 to avoid...

> I understand, thanks for the follow up. FYI discussions are underway on addressing the compiler bug. I have updated hwy/detect_targets.h to mark the SVE targets as broken on macOS...