graph: backend: dnnl: backend refactor and sdpa v1 kernel support quantize SDPA
Background
This is a follow-up work based on #2930 and #2931. The PR mainly focuses on supporting quantized SDPA with internal dnnl_sdpa. It helps to reduce graph compilation time and also simplifies the backend optimization pass.
Works
- [x] DNNL backend refactor:
- attach fusion info to op attr directly
- rename fusion_info_mgr to reflect it's current usage
- [x] Support compressed SDPA with internal dnnl_sdpa.
- [x] Support legacy GQA pattern with internal dnnl_sdpa.
- [x] Merge
sdp_primitive_kernel_tandsdp_primitive_v1_kernel_t
TODO
- Support CPU decompose kernel with internal dnnl_sdpa.
Testing results:
For all 218 mha test cases, we now have 67 ukernel-optimized cases that can run successfully in the sdp_primitive_v1_kernel_t kernel.
-
compressed SDPA dot graph: before fusion
after fusion
-
legacy GQA dot graph before fusion
after fusion
Can you please split the fusion info refactor into a separate PR, for better review experience?
ok, sure, will do that.
make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph
make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph
make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph
make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph
make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph
make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph
make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph
make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph