sterrettm2 issues

Results 8 issues of


                                            sterrettm2

Adds support for kv-select, kv-partial sort, and descending order for all key-value functions.

This patch adds support for descending kv-sort and ascending/descending kv-select and kv-partial_sort For reference, some benchmarks comparing to Pytorch's scalar implementation are provided: With normally distributed float32: ``` Partial Sort...

Cleanup for single vector sort/bitonic merge (and minor cleanup for argsort/argselect)

This patch rewrites all of the single vector sorting and bitonic merging to use swizzle ops and generic masks to reduce code duplication. It also centralizes all of this logic...

Adds support for accelerated sorting with x86-simd-sort

Adds x86-simd-sort as a submodule to accelerate sorting for 32-bit and 64-bit datatypes when AVX2 or AVX512 are available. For contiguous data, this can be over a 10x speedup for...

module: cpu

triaged

open source

module: inductor

Fixes bug with nested OpenMP, fixed task threshold, extended tests range

Fixes the bug with nested OpenMP by adding #pragma omp taskwait Changes the task_threshold when OpenMP is enabled but parallelization isn't chosen from 0 to the max value for arrsize_t;...

ENH: Convert tanh from C universal intrinsics to C++ using Highway

This is another patch demonstrating how the current NumPy SIMD code could be converted to Highway, similar to #25781. All tests pass on my local AVX512 and AVX2 machine. On...

01 - Enhancement

component: SIMD

sterrettm2

Adds support for kv-select, kv-partial sort, and descending order for all key-value functions.

Cleanup for single vector sort/bitonic merge (and minor cleanup for argsort/argselect)

Adds support for accelerated sorting with x86-simd-sort

Fixes bug with nested OpenMP, fixed task threshold, extended tests range

ENH: Convert tanh from C universal intrinsics to C++ using Highway

How to get incremental builds working under bazel/make?

Enable fp16 nonnative support for dynamic dispatch, make more ergonomic for static dispatch

Try to get better type errors for the static sorting functions