oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

GPU: GEMM: update accumulation + quantization

Open Simonsays095 opened this issue 5 months ago • 2 comments

In some cases (example below), GEMM will dispatch to reference because it's looking for integer accumulation kernels when none exist. Updated to look for the expected accumulation type, and optionally fp32 variants in some cases. Since we're comparing scores of more strategies against each other, some strategies have been refit/altered to address regressions where a slower strategy would have been selected.

Also refactors quantization attributes to deduplicate variables and simplify the workflow a bit. This should help with an eventual swap_ab refactor.

$ ./oneDNN/build/tests/benchdnn/benchdnn --matmul -v5 --mode=I --engine=gpu --allow-enum-tags-only=false --dt=u8:u4:f16 --stag=ab --wtag=ba --dtag=ab 14992x14336:14336x4096 create: --mode=I --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:u4:f16 --stag=ab --wtag=ba --dtag=ab 14992x14336:14336x4096 oneDNN implementation: ocl:ref:any run (just report, no exec): --mode=I --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:u4:f16 --stag=ab --wtag=ba --dtag=ab 14992x14336:14336x4096 0:INITIALIZED (20 ms) __REPRO: --mode=I --matmul --engine=gpu --allow-enum-tags-only=false --dt=u8:u4:f16 --stag=ab --wtag=ba --dtag=ab 14992x14336:14336x4096 tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0 total: 0.02s; create_pd: 0.00s (2%); create_prim: 0.00s (18%);

Simonsays095 avatar Aug 14 '25 21:08 Simonsays095

make test disable test_device_cpu disable build_cpu_runtime_omp disable build_cpu_runtime_sycl disable build_cpu_runtime_tbb disable benchdnn_all set test_scope=NIGHTLY enable benchdnn_matmul enable benchdnn_ip

Simonsays095 avatar Sep 26 '25 21:09 Simonsays095

make test perf-gpu set primitive=gpu:gemm

Simonsays095 avatar Sep 26 '25 21:09 Simonsays095