John Platts comments

Results 67 comments of


                                            John Platts

Implement a few missing things using SSE or NEON intrinsics.

A more optimal implementation of the set-before-first operation for masks is possible on SSE4/AVX2/AVX3/RVV/PPC10. Here is how the SetBeforeFirst operation could be implemented for masks for 128-bit or smaller vectors...

Implement a few missing things using SSE or NEON intrinsics.

> @johnplatts sounds like you are proposing a new SetBeforeFirst op, unrelated to this particular MulInt code? That sounds potentially useful, would welcome a pull request with your code. But...

Implement a few missing things using SSE or NEON intrinsics.

The SSE4.2 PCMPISTRI, PCMPISTRM, PCMPESTRI, and PCMPESTRM can do the following operations using a single instruction: - Equal any (equivalent to the following): ``` template MFromD EqualAny(V a, V b,...

Make `Lanes` `constexpr` on `arm-sve` target

`Lanes(double_vector_tag)` returns the actual number of lanes in `Vec`, and the result of `Lanes(double_vector_tag)` can differ from `MaxLanes(double_vector_tag)` on targets that use scalable vectors such as SVE or RVV. Here...

Improving dynamic dispatch for multiple targets for x86-64/AArch64/PPC64

Here is an example of Highway dynamic dispatch code updated to support multi-phase compilation (compiled more than once with different compiler options for the different compilation phases): ``` // Generates...

Support for saturating doubling multiply add

AVX3_DL is also capable of carrying out the saturated doubling multiply add using the _mm*_dpwssds_epi32 intrinsics. Here is how the vqdmla op can be implemented on AVX3_DL for I32 vectors...

Choosing NEON over SVE when fixed size vectors are used where possible

It is possible to bitcast SVE vectors to NEON vectors and vice versa on GCC and Clang releases that have support for the arm_neon_sve_bridge.h header, including Clang 14 and later...

Choosing NEON over SVE when fixed size vectors are used where possible

Here is a link to a Compiler Explorer snippet that demonstrates the use of the ARM NEON SVE Bridge intrinsics (which are defined in the arm_neon_sve_bridge.h header) to convert between...

Different test results using Clang when enabling Debug or not on targer RVV

There were bugs in RVV F64->F32 and F32->F16 DemoteTo, which are fixed in pull request #2164. RVV Ceil and Floor have also been reimplemented in pull request #2164 to avoid...

Fail to compile vqsort with clang 16 on Darwin M1

> I understand, thanks for the follow up. FYI discussions are underway on addressing the compiler bug. I have updated hwy/detect_targets.h to mark the SVE targets as broken on macOS...