xe: jit: gemm: downstream gemmstone
Omnibus pulldown from upstream gemmstone repository. Highlights include:
- Long-overdue refactoring of register layouts into their own class,
RegisterLayout - Numerous copy planner optimizations/fixes:
- New dedicated upconversions to bf16 that save register space and sometimes save a few cycles/register
- A cycle or so shaved off of various conversions
- More de-interleaving prior to complex conversion sequences
- Fix some corner cases in hf8/e2m1/e3m0 downconversions where rounding was happening in the wrong direction
- Expanded support for C repacking ("dynamic quantization" scenarios). Now supports FMA and dot kernels.
- nf4 upconversion
- Bug fixes/cleanup in A/B dequantization
- Resolved some register spills in the catalog
make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn
make test perf-gpu set primitive=matmul ip
make test linters
make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn
make test perf-gpu set primitive=matmul ip
Note: the DG2 dynamic quantization regressions reported in perf CI are the same as in #3357, and are not true issues in this PR.
make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn
make test perf-gpu set primitive=matmul ip
make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn
make test perf-gpu set primitive=matmul ip
make test perf-gpu set primitive=matmul ip
make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn
make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn
make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn
make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn
make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn
DG2 testing didn't start in CI but manual testing looks OK.