oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

xe: jit: gemm: downstream gemmstone

Open petercad opened this issue 8 months ago • 6 comments

Omnibus pulldown from upstream gemmstone repository. Highlights include:

  • Long-overdue refactoring of register layouts into their own class, RegisterLayout
  • Numerous copy planner optimizations/fixes:
    • New dedicated upconversions to bf16 that save register space and sometimes save a few cycles/register
    • A cycle or so shaved off of various conversions
    • More de-interleaving prior to complex conversion sequences
    • Fix some corner cases in hf8/e2m1/e3m0 downconversions where rounding was happening in the wrong direction
  • Expanded support for C repacking ("dynamic quantization" scenarios). Now supports FMA and dot kernels.
  • nf4 upconversion
  • Bug fixes/cleanup in A/B dequantization
  • Resolved some register spills in the catalog

petercad avatar Jun 03 '25 23:06 petercad

make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn

petercad avatar Jun 16 '25 15:06 petercad

make test perf-gpu set primitive=matmul ip

petercad avatar Jun 16 '25 15:06 petercad

make test linters

petercad avatar Jun 16 '25 15:06 petercad

make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn

petercad avatar Jun 16 '25 21:06 petercad

make test perf-gpu set primitive=matmul ip

petercad avatar Jun 16 '25 21:06 petercad

Note: the DG2 dynamic quantization regressions reported in perf CI are the same as in #3357, and are not true issues in this PR.

petercad avatar Jun 16 '25 22:06 petercad

make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn

petercad avatar Jun 30 '25 23:06 petercad

make test perf-gpu set primitive=matmul ip

petercad avatar Jun 30 '25 23:06 petercad

make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn

petercad avatar Jul 01 '25 02:07 petercad

make test perf-gpu set primitive=matmul ip

petercad avatar Jul 01 '25 02:07 petercad

make test perf-gpu set primitive=matmul ip

petercad avatar Jul 01 '25 16:07 petercad

make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn

petercad avatar Jul 01 '25 20:07 petercad

make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn

petercad avatar Jul 01 '25 23:07 petercad

make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn

petercad avatar Jul 01 '25 23:07 petercad

make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn

petercad avatar Jul 02 '25 23:07 petercad

make test disable test_device_cpu enable test_device_gpu disable benchdnn_all enable benchdnn_matmul enable benchdnn_ip enable benchdnn_rnn

petercad avatar Jul 03 '25 18:07 petercad

DG2 testing didn't start in CI but manual testing looks OK.

petercad avatar Jul 03 '25 21:07 petercad