John Platts

Results 67 comments of John Platts

> Nice find, thank you @johnplatts ! Would you like to send this code as a pull request, with a comment mentioning the intel.com forum discussion link? I have made...

Windows on AArch64 also has the IsProcessorFeaturePresent function that can check for the presence of some of the AArch64 instruction set extensions (including the SDOT/UDOT instructions), and the IsProcessorFeaturePresent function...

> > Windows on AArch64 also has the IsProcessorFeaturePresent function that can check for the presence of some of the AArch64 instruction set extensions > > Unfortunately, that doesn't cover...

> A quick follow-up: TableLookupBytes has the quirk of staying within 128-bit blocks. But the AVX-512 operations here support full permutes across all vector lanes, same as SVE and RVV....

> I see that CI prevented the merge here: there's a clang internal compiler error for the Zen4 target. BF16 compiler support has really been half-baked :/ > > ```...

> Thank you for continuing to debug this :) Yes, unfortunately we still have an issue: > > ``` > fatal error: error in backend: Cannot select: 0x340bb7ffb5b0: v32bf16 =...

> Our internal clang is quite close to trunk. Unfortunately it does not come with an updated version number. Would it help to use our HWY_COMPILER_CLANG to deduce the major...

What is still causing the CI checks to fail in this pull request?

Here is how the vector version of the MulInt operation above can be implemented in Highway for an int32_t vector: ``` template HWY_INLINE V MulInt(V a, V b) { const...

> Hi @ibogosavljevic , it's always interesting to hear how the documentation works for new users and what can be improved. I'm curious what the missing pieces were? > >...