cpu: aarch64: brgemm: Add support for int8 in brgemm kernel

Open kasturedeeksha opened this issue 7 months ago • 0 comments

Description

This PR extends the BRGEMM (Batch-Reduce General Matrix Multiplication) kernel to support additional INT8 data types, enabling broader applicability for low-precision computations, particularly in deep learning workloads.

Supported Data Type Tags The following source:weight:destination (src:wei:dst) combinations are now supported:

s8:s8:f32
u8:u8:f32
u8:s8:f32

Checklist

General

[x] Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?

make test output

98% tests passed, 4 tests failed out of 224
 
Total Test time (real) = 1058.22 sec
 
The following tests FAILED:
        172 - test_graph_unit_dnnl_large_partition_cpu (Failed)
        195 - test_benchdnn_modeC_binary_ci_cpu (Failed)
        196 - test_benchdnn_modeC_binary_different_dt_ci_cpu (Failed)
        204 - test_benchdnn_modeC_graph_ci_cpu (Failed)

Output is same before and after the code changes.

brgemm_test_all output command used :

./benchdnn --brgemm --batch=inputs/brgemm/test_brgemm_all

Before

tests:660480 passed:18496 skipped:641984 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 24.50s; create_pd: 0.00s (0%); create_prim: 0.00s (0%); fill: 7.30s (30%); execute: 0.00s (0%); compute_ref: 4.30s (18%); compare: 5.34s (22%);

After

tests:660480 passed:20480 skipped:640000 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 23.11s; create_pd: 0.00s (0%); create_prim: 0.00s (0%); fill: 6.83s (30%); execute: 0.00s (0%); compute_ref: 3.82s (17%); compare: 4.40s (19%);

[x] Have you formatted the code using clang-format? Yes

Jun 11 '25 08:06 kasturedeeksha