oneDNN
oneDNN copied to clipboard
cpu: aarch64: brgemm: Add support for int8 in brgemm kernel
Description
This PR extends the BRGEMM (Batch-Reduce General Matrix Multiplication) kernel to support additional INT8 data types, enabling broader applicability for low-precision computations, particularly in deep learning workloads.
Supported Data Type Tags The following source:weight:destination (src:wei:dst) combinations are now supported:
- s8:s8:f32
- u8:u8:f32
- u8:s8:f32
Checklist
General
- [x] Do all unit and benchdnn tests (
make testandmake test_benchdnn_*) pass locally for each commit?
- make test output
98% tests passed, 4 tests failed out of 224
Total Test time (real) = 1058.22 sec
The following tests FAILED:
172 - test_graph_unit_dnnl_large_partition_cpu (Failed)
195 - test_benchdnn_modeC_binary_ci_cpu (Failed)
196 - test_benchdnn_modeC_binary_different_dt_ci_cpu (Failed)
204 - test_benchdnn_modeC_graph_ci_cpu (Failed)
Output is same before and after the code changes.
- brgemm_test_all output command used :
./benchdnn --brgemm --batch=inputs/brgemm/test_brgemm_all
Before
tests:660480 passed:18496 skipped:641984 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 24.50s; create_pd: 0.00s (0%); create_prim: 0.00s (0%); fill: 7.30s (30%); execute: 0.00s (0%); compute_ref: 4.30s (18%); compare: 5.34s (22%);
After
tests:660480 passed:20480 skipped:640000 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 23.11s; create_pd: 0.00s (0%); create_prim: 0.00s (0%); fill: 6.83s (30%); execute: 0.00s (0%); compute_ref: 3.82s (17%); compare: 4.40s (19%);
- [x] Have you formatted the code using clang-format? Yes