oneDNN graph: backend: dnnl: backend refactor and sdpa v1 kernel support quantize SDPA

Background

This is a follow-up work based on #2930 and #2931. The PR mainly focuses on supporting quantized SDPA with internal dnnl_sdpa. It helps to reduce graph compilation time and also simplifies the backend optimization pass.

Works

[x] DNNL backend refactor:
- attach fusion info to op attr directly
- rename fusion_info_mgr to reflect it's current usage
[x] Support compressed SDPA with internal dnnl_sdpa.
[x] Support legacy GQA pattern with internal dnnl_sdpa.
[x] Merge sdp_primitive_kernel_t and sdp_primitive_v1_kernel_t

TODO

Support CPU decompose kernel with internal dnnl_sdpa.

Testing results:

For all 218 mha test cases, we now have 67 ukernel-optimized cases that can run successfully in the sdp_primitive_v1_kernel_t kernel.

compressed SDPA dot graph: before fusion after fusion
legacy GQA dot graph before fusion after fusion

Jun 16 '25 03:06 xiang1guo

Can you please split the fusion info refactor into a separate PR, for better review experience?

ok, sure, will do that.

Jun 16 '25 05:06 xiang1guo