Bill Barton issues

Results 15 issues of


                                            Bill Barton

sign(float64<N>) generates incorrect code on NEON64 using gcc 9.3.0 with -ffast-math

auto foo24(float64x2 a) { return sign(a); } generates: ``` 0000000000000000 : 0: 6f00e402 movi v2.2d, #0x0 4: d10383ff sub sp, sp, #0xe0 8: 910383ff add sp, sp, #0xe0 c: 4e221c00...

NEON64 implementation of floor(float64x2) incorrect

uses vrndnq_f64(), which is round to nearest. should use vrndmq_f64(), which is round towards -inf.

possible comma misuse warning in shuffle2x2.h

xcode SSE build, dev branch at dcc04d11 Possible misuse of comma operator here: line 277: float32 fa, fb; fa = a, fb = b; line 282: float32 fa, fb; fa...

NEON_FLT_SP and ALTIVEC float32x4 implementation of sqrt(0) returns NaN

the formula for these appears to use the 1/sqrt(a) estimate operation, which presumably results in Inf for a=0. Probably need to explicitly check for a==0 and mask result accordingly.

bug

shuffle4x2() generates incorrect code on SSE4_1

simdpp::float32x4 foo(simdpp::float32x4 a, simdpp::float32x4 b) { return simdpp::shuffle4x2(a,b); } ``` foo(simdpp::arch_sse4p1::float32, simdpp::arch_sse4p1::float32): 00000000000001c0 pushq %rbp 00000000000001c1 movq %rsp, %rbp 00000000000001c4 shufps $0x90, %xmm0, %xmm1 00000000000001c8 movaps %xmm1, %xmm0 00000000000001cb popq...

typo: __x64_64__ symbol in setup_arch.h should be __x86_64__

``` #if __i386__ || __i386 || _M_IX86 || __amd64__ || __x64_64__ || _M_AMD64 || _M_X64 #define SIMDPP_X86 1 #elif _M_ARM || __arm__ || __aarch64__ ``` fortunately `__amd64__` seems to also...

float32 rcp_rh() template should have independent expression types

template SIMDPP_INL float32 rcp_rh(const float32& x, const float32& a) prevents use of expressions as args to this function, as it's highly unlikely that both args will have the same E...

bug

unary operator-() is missing

expressions such as: ``` simdpp::float32x4 foo(simdpp::float32x4 a) { return -a; } ``` result in a compilation error.

enhancement

round() doesn't exist

the round() function documented [here](http://p12tic.github.io/libsimdpp/v2.2-dev/libsimdpp/w/fp/round.html) doesn't appear to exist.

bug

mask narrowing and widening functions

a set of functions like this mask_int32 to _mask_int32(mask_int16 a); (akin to the int/uint widening and narrowing operations) This particular function would convert 16-bit lanes of 0 or 0xffff or...

enhancement