xsimd
xsimd copied to clipboard
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))
It would be very useful to be able to use directly `batch_bool` mask to increment/decrement integer `batch`, using the integer overfull trick. It allows to update a table of indices...
I could not find any mention of doing within vector reductions, or equally transposing blocks of vectors to make such reductions vectorizable. If I am summing a long list of...
Do you have any guidelines how one would go and implement saturated addition and subtraction?
As reported in #289, neon instruction set is not detected when building on armv7hl. A workaround is `#define XSIMD_FORCE_ARM_INSTR_SET = 70000000` before including any header of xsimd.
I'm trying to use XTENSOR_USE_XSIMD in my project, which otherwise compiles and runs fine on a Raspberry Pi 3B+ with up to date Raspbian Stretch Using - xtensor master -...
For the types specified in: xsimd_avx_double.hpp, xsimd_sse_float.hpp and xsimd_sse_int32.hpp The SSE method `store_aligned_int32(uint8_t* dst)` (and similar) stores using the function `_mm_storel_epi64`. As this is a batch of 4 values, and...
`pow` of `complex` has some accuracy issues with AVX512.
~I think the tile says it all :)~ When running "make/ninja xbenchmark" on my Haswell-based machine, a pair of "neon" rows is present in the table of timings, even though...
I tried to integrate xsimd with xcode project but I get the following 2 errors in `xsimd_scalar.hpp` namespace detail { template inline C sign_complex_scalar_impl(const C& v) { using value_type =...
In order to call store_aligned / unaligned in a generic fashion we should make the second parameter also aware of the batch size not being the max batch size (e.g....