Saturated add and subtract seems to be missing.
Do you have any guidelines how one would go and implement saturated addition and subtraction?
+1 It could be very usefull for image processing library. Same comment than Peter, if you have guidelines? Regards TR
probably making a free function like xsimd::sadd(a, b) would be the way to go.
Then you could implement it for the different architectures in the kernels as a xsimd::kernel::sadd(..).
And in the last step you can use the function which is implemented in the kernel inside the xsimd::sadd function.
For example, here is the kernel implementation for AVX int32 for add:
https://github.com/xtensor-stack/xsimd/blob/38eed51285edad0dd75b552c99cdd8178565e24d/include/xsimd/types/xsimd_avx_int32.hpp#L200-L207
Now, you could just copy-paste that part and call it sadd, for a start.
There is some template magic to select the appropriate kernel. Your function might then look like:
template <class T>
auto sadd(xsimd::simd_base<T>& lhs, xsimd::simd_base<T>& rhs) {
using value_type = typename simd_batch_traits<X>::value_type;
using kernel = detail::batch_kernel<value_type, simd_batch_traits<X>::size>;
return kernel::sadd(lhs(), rhs());
}
would be great if you give it a shot! We're happy to assist further :)
i'll give it a try