oneDNN icon indicating copy to clipboard operation
oneDNN copied to clipboard

graph: backend: dnnl: backend refactor and sdpa v1 kernel support quantize SDPA

Open xiang1guo opened this issue 8 months ago • 1 comments

Background

This is a follow-up work based on #2930 and #2931. The PR mainly focuses on supporting quantized SDPA with internal dnnl_sdpa. It helps to reduce graph compilation time and also simplifies the backend optimization pass.

Works

  • [x] DNNL backend refactor:
    • attach fusion info to op attr directly
    • rename fusion_info_mgr to reflect it's current usage
  • [x] Support compressed SDPA with internal dnnl_sdpa.
  • [x] Support legacy GQA pattern with internal dnnl_sdpa.
  • [x] Merge sdp_primitive_kernel_t and sdp_primitive_v1_kernel_t

TODO

  • Support CPU decompose kernel with internal dnnl_sdpa.

Testing results:

For all 218 mha test cases, we now have 67 ukernel-optimized cases that can run successfully in the sdp_primitive_v1_kernel_t kernel.

  • compressed SDPA dot graph: before fusion image after fusion image

  • legacy GQA dot graph before fusion image after fusion image

xiang1guo avatar Jun 16 '25 03:06 xiang1guo

Can you please split the fusion info refactor into a separate PR, for better review experience?

ok, sure, will do that.

xiang1guo avatar Jun 16 '25 05:06 xiang1guo

make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph

xiang1guo avatar Jul 02 '25 01:07 xiang1guo

make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph

xiang1guo avatar Jul 11 '25 06:07 xiang1guo

make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph

xiang1guo avatar Aug 13 '25 02:08 xiang1guo

make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph

xiang1guo avatar Aug 13 '25 08:08 xiang1guo

make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph

xiang1guo avatar Aug 13 '25 13:08 xiang1guo

make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph

xiang1guo avatar Aug 14 '25 05:08 xiang1guo

make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph

xiang1guo avatar Aug 15 '25 06:08 xiang1guo

make test set test_scope=NIGHTLY disable benchdnn_all enable benchdnn_graph

xiang1guo avatar Aug 18 '25 01:08 xiang1guo