oneDAL icon indicating copy to clipboard operation
oneDAL copied to clipboard

[enh] Add avx2 support in finiteness_checker

Open icfaust opened this issue 2 years ago • 10 comments

Description

Duplicate AVX512 functions for AVX2 by switching certain numbers to half size and changing instructions from 512 to 256 bit width. Due to the hardcoded nature of the functions, it is not easily templated out without performance loss. This implementation should improve sklearnex performance on standard benchmarks.

Changes proposed in this pull request:

  • Add avx2 finite sum check
  • Add avx2 finiteness per element check
  • Add avx2 SOA supports
  • Move final comparison of finalMask out of for loop to reduce branching in AVX512 inf/nan check.

Tasks

  • [x] Implement AVX2
  • [x] Get it to compile
  • [x] Green CI
  • [x] Run sklearnex Benchmarks

icfaust avatar Feb 20 '24 06:02 icfaust

/intelci: run

icfaust avatar Feb 20 '24 07:02 icfaust

/intelci: run

icfaust avatar Feb 20 '24 09:02 icfaust

/intelci: run

icfaust avatar Feb 20 '24 12:02 icfaust

test fail related to rbf kernel, which doesn't use this code

icfaust avatar Feb 21 '24 08:02 icfaust

/intelci: run

icfaust avatar Feb 28 '24 11:02 icfaust

/intelci: run

icfaust avatar Mar 01 '24 07:03 icfaust

/intelci: run

icfaust avatar Mar 01 '24 08:03 icfaust

Private CI just shows an unrelated issue with LibLinear convergence issues, which shouldn't be touched by this code / is likely sporadic.

icfaust avatar Mar 01 '24 10:03 icfaust

/intelci: run

icfaust avatar Mar 11 '24 16:03 icfaust

private CI run with intel/scikit-learn-intelex#1759 build should use avx2 by default exposing this new code immediately: http://intel-ci.intel.com/eedfc7b0-6419-f133-b20a-a4bf010d0e2e

icfaust avatar Mar 11 '24 16:03 icfaust

wrote a special sklearnex version which will print warnings when an inf or nan is observed, which should show up in pytest for sklearn. https://github.com/intel/scikit-learn-intelex/compare/main...icfaust:scikit-learn-intelex:test/warning_finite?expand=1

I have run a special private CI run with this branch to see if it shows up at all in sklearn conformance tests, to see how much sklearn tests _assert_all_finite, specifically to see if it is actually activated: http://intel-ci.intel.com/eeea9220-3038-f1df-9c06-a4bf010d0e2e (running against onedal-src/oneDAL/main)

icfaust avatar Mar 22 '24 11:03 icfaust

wrote a special sklearnex version which will print warnings when an inf or nan is observed, which should show up in pytest for sklearn. https://github.com/intel/scikit-learn-intelex/compare/main...icfaust:scikit-learn-intelex:test/warning_finite?expand=1

I have run a special private CI run with this branch to see if it shows up at all in sklearn conformance tests, to see how much sklearn tests _assert_all_finite, specifically to see if it is actually activated: http://intel-ci.intel.com/eeea9220-3038-f1df-9c06-a4bf010d0e2e (running against onedal-src/oneDAL/main)

So after activating a warning for when an inf or nan is spotted and by running all sklearn conformance testing, the only time an inf or nan occurs in the sklearn testing is here: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/tests/test_validation.py#L981 Therefore intel/scikit-learn-intelex#1759 is sadly necessary

icfaust avatar Mar 25 '24 11:03 icfaust

This will likely pass CI, but performance benchmarks are necessary due to the underlying changes in the CPU function dispatching.

icfaust avatar Apr 19 '24 08:04 icfaust

/intelci: run

icfaust avatar Apr 19 '24 08:04 icfaust

/intelci: run

icfaust avatar Apr 22 '24 07:04 icfaust

Things required before re-review: a privateCI run for checking avx512, and oneDAL performance benchmarks of changes to function dispatching.

icfaust avatar Apr 22 '24 07:04 icfaust

Run with an avx512 build: http://intel-ci.intel.com/ef007c41-cb1f-f115-9514-a4bf010d0e2e failures due to un-related GPU issues.

icfaust avatar Apr 22 '24 08:04 icfaust

private CI failures due to unrelated GPU/dpc timeouts

icfaust avatar Apr 23 '24 04:04 icfaust

private CI run with last sklearnex master (includes _assert_all_finite tests coming from intel/scikit-learn-intelex#1759) http://intel-ci.intel.com/ef012d7e-a408-f166-adc8-a4bf010d0e2e

icfaust avatar Apr 23 '24 04:04 icfaust

/intelci: run

icfaust avatar Apr 23 '24 14:04 icfaust

Rerun due to CI timeouts: http://intel-ci.intel.com/ef01f546-5586-f1d1-863c-a4bf010d0e2e

icfaust avatar Apr 24 '24 04:04 icfaust