Raghuveer Devulapalli

Results 19 issues of Raghuveer Devulapalli

Adding compile and runtime support for upcoming Intel Xeon Sapphire Rapids.

component: SIMD
36 - Build

This patch leverages the` vcvtps2ph`, `vcvtpd2ps `instructions and float32 SVML functions to accelerate float16 umath functions. Max ULP error < 1 for all the math functions.

01 - Enhancement
component: SIMD

This patch adds AVX512 based 64-bit on AVX512-SKX and 16-bit sorting on AVX512-ICL. All the AVX512 sorting code has been reformatted as a separate header files and put in a...

01 - Enhancement

This patch is to experiment with highway and see how we can leverage its intrinsics using static dispatch. I would think these are the minimum requirements: - [ ] passes...

01 - Enhancement

Split different configurations of meson

The error is a problem when building NumPy with baseline cpu feature of avx512f.

32-bit argsort uses ymm registers: we can switch to zmm registers (use 2x i64gather instructions) and add new bitonic networks.

I suspect this function https://github.com/intel/x86-simd-sort/blob/7d7591cf5927e83e4a1e7c4b6f2c4dc91a97889f/src/avx512-16bit-qsort.hpp#L65 can be improved with fewer operations. See: https://github.com/numpy/numpy/blob/0bd56e7ec12f8ceeb8d082340e71e60b873d5c57/numpy/core/src/npysort/npysort_common.h#L153 for reference.