[enh] Add avx2 support in finiteness_checker
Description
Duplicate AVX512 functions for AVX2 by switching certain numbers to half size and changing instructions from 512 to 256 bit width. Due to the hardcoded nature of the functions, it is not easily templated out without performance loss. This implementation should improve sklearnex performance on standard benchmarks.
Changes proposed in this pull request:
- Add avx2 finite sum check
- Add avx2 finiteness per element check
- Add avx2 SOA supports
- Move final comparison of finalMask out of for loop to reduce branching in AVX512 inf/nan check.
Tasks
- [x] Implement AVX2
- [x] Get it to compile
- [x] Green CI
- [x] Run sklearnex Benchmarks
/intelci: run
/intelci: run
/intelci: run
test fail related to rbf kernel, which doesn't use this code
/intelci: run
/intelci: run
/intelci: run
Private CI just shows an unrelated issue with LibLinear convergence issues, which shouldn't be touched by this code / is likely sporadic.
/intelci: run
private CI run with intel/scikit-learn-intelex#1759 build should use avx2 by default exposing this new code immediately: http://intel-ci.intel.com/eedfc7b0-6419-f133-b20a-a4bf010d0e2e
wrote a special sklearnex version which will print warnings when an inf or nan is observed, which should show up in pytest for sklearn. https://github.com/intel/scikit-learn-intelex/compare/main...icfaust:scikit-learn-intelex:test/warning_finite?expand=1
I have run a special private CI run with this branch to see if it shows up at all in sklearn conformance tests, to see how much sklearn tests _assert_all_finite, specifically to see if it is actually activated: http://intel-ci.intel.com/eeea9220-3038-f1df-9c06-a4bf010d0e2e (running against onedal-src/oneDAL/main)
wrote a special sklearnex version which will print warnings when an inf or nan is observed, which should show up in pytest for sklearn. https://github.com/intel/scikit-learn-intelex/compare/main...icfaust:scikit-learn-intelex:test/warning_finite?expand=1
I have run a special private CI run with this branch to see if it shows up at all in sklearn conformance tests, to see how much sklearn tests _assert_all_finite, specifically to see if it is actually activated: http://intel-ci.intel.com/eeea9220-3038-f1df-9c06-a4bf010d0e2e (running against onedal-src/oneDAL/main)
So after activating a warning for when an inf or nan is spotted and by running all sklearn conformance testing, the only time an inf or nan occurs in the sklearn testing is here: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/tests/test_validation.py#L981 Therefore intel/scikit-learn-intelex#1759 is sadly necessary
This will likely pass CI, but performance benchmarks are necessary due to the underlying changes in the CPU function dispatching.
/intelci: run
/intelci: run
Things required before re-review: a privateCI run for checking avx512, and oneDAL performance benchmarks of changes to function dispatching.
Run with an avx512 build: http://intel-ci.intel.com/ef007c41-cb1f-f115-9514-a4bf010d0e2e failures due to un-related GPU issues.
private CI failures due to unrelated GPU/dpc timeouts
private CI run with last sklearnex master (includes _assert_all_finite tests coming from intel/scikit-learn-intelex#1759) http://intel-ci.intel.com/ef012d7e-a408-f166-adc8-a4bf010d0e2e
/intelci: run
Rerun due to CI timeouts: http://intel-ci.intel.com/ef01f546-5586-f1d1-863c-a4bf010d0e2e