bert model split into many layers after int8 quantization
I first post the issue in https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues/159
I quantize a pytorch bert model using TensorRT-Model-Optimizer
before quantization, I export this model to tensorrt and there is only one layer
but after quantization there are many layers, why? can this be fixed?
(only part of these layers)
I export this model to tensorrt and there is only one layer
What is cmd you used ?
first I export torch to onnx using :
torch.onnx.export
then I export onnx to trt engine using python trt
network_from_onnx_path
engine_from_network
then I export onnx to trt engine using python trt
what flag config.set_flag() you set ?
@lix19937
self.config.set_flag(trt.BuilderFlag.FP16)
self.config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
Because FP16 (not int8 quant), mylin compiler the some shape-ops as one node.
so why it can not compile int8 quant model as one node
There are 2 situations.
- If you use float32 onnx,
fp16flag, orfp16+int8flag, it is still a myelin node. - If you use moq.quantize for int8 ptq, the onnx structure has actually changed and is not merged into one layer.
ok I used moq.quantize for int8 ptq is there any way to merged the quantized onnx into one layer?
@DamonsJ Can you help to upload the ONNX files before and after the quantization so that we can further investigate the multi-layer splitting and the performance diff in the original issue?
Test Environment: gpu: a100 tensorrt version: 10.10 cuda: 12.8 os: ubuntu22.04
Analysis:
The profilingVerbosity config will influence the display info of trt engine layer info.
Here is the result of profilingVerbosity = layer_names_only
[04/18/2025-03:52:06] [V] [TRT] Engine Layer Information:
Layer(Myelin): {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, Tactic: 0x0000000000000000, concat_ids (Int32[1,40]), concat_mask (Bool[1,40]), sent_ids (Int32[1,40]), pos_ids (Int32[1,40]) -> Reformatted Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]} (Half[1])
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, Tactic: 0x0000000000000000, Reformatted Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]} (Half[1]) -> score (Float[1])
Here is the result of profilingVerbosity = detailed
[04/18/2025-03:51:19] [V] [TRT] Layers:
Name: __myl_GathGathGathMulAddMulAddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_0, LayerType: kgen, Inputs: [ { Name: pos_ids, Dimensions: [1,40], Format/Datatype: Int32 }, { Name: sent_ids, Dimensions: [1,40], Format/Datatype: Int32 }, { Name: concat_ids, Dimensions: [1,40], Format/Datatype: Int32 }], Outputs: [ { Name: __mye56087_5, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_GathGathGathMulAddMulAddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xd6fb31e7506899c9795a4712e7e9a194, StreamId: 0, Metadata: [ONNX Layer: /encoder/embedding/Gather][ONNX Layer: /encoder/Mul][ONNX Layer: /encoder/Add][ONNX Layer: /encoder/sent_embedding/Gather][ONNX Layer: /encoder/Mul_1][ONNX Layer: /encoder/pos_encoding/Add][ONNX Layer: /encoder/pos_encoding/pe/Gather][ONNX Layer: /encoder/pos_encoding/layernorm/LayerNormalization]
Name: /encoder/layers_0/mha/attn/mha_prob/key/MatMul+/encoder/layers_0/mha/attn/mha_prob/query/MatMul+/encoder/layers_0/mha/attn/value/MatMul_myl0_1, LayerType: gemm, Inputs: [ { Name: __mye56087_5, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56087, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.0/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.0/mha/attn/value/Add]
Name: __myl_ReplReshReplAndNotCastMul_myl0_2, LayerType: kgen, Inputs: [ { Name: concat_mask, Dimensions: [1,1,1,40], Format/Datatype: Bool }], Outputs: [ { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }], TacticName: __myl_ReplReshReplAndNotCastMul_0x28d2b74213c12bac17b6e487f910f07d, StreamId: 0, Metadata: [ONNX Layer: /encoder/And][ONNX Layer: /encoder/Cast_1][ONNX Layer: /encoder/Not][ONNX Layer: /encoder/Mul_2][ONNX Layer: /encoder/Unsqueeze_5][ONNX Layer: /encoder/Cast]
Name: _gemm_mha_v2_myl0_3, LayerType: kgen, Inputs: [ { Name: __mye56087, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56087, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56087, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54201, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/mha/attn/MatMul][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_0/mha/output/dense/MatMul_myl0_4, LayerType: gemm, Inputs: [ { Name: __mye54201, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_9, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.0/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_5, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_9, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56087_5, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_10, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/mha/output/Add][ONNX Layer: /encoder/layers.0/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_6, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_10, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_0/pos_ff/output/Add_output_0'.1_11, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.0/pos_ff/Mul][ONNX Layer: /encoder/layers.0/pos_ff/Mul_1][ONNX Layer: /encoder/layers.0/pos_ff/Add][ONNX Layer: /encoder/layers.0/pos_ff/Div][ONNX Layer: /encoder/layers.0/pos_ff/Erf][ONNX Layer: /encoder/layers.0/pos_ff/dense/Add]
Name: /encoder/layers_0/pos_ff/output/MatMul_myl0_7, LayerType: gemm, Inputs: [ { Name: /encoder/layers_0/pos_ff/output/Add_output_0'.1_11, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_12, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.0/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_8, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_12, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_10, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56070_13, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/pos_ff/Add_1][ONNX Layer: /encoder/layers.0/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_1/mha/attn/mha_prob/key/MatMul+/encoder/layers_1/mha/attn/mha_prob/query/MatMul+/encoder/layers_1/mha/attn/value/MatMul_myl0_9, LayerType: gemm, Inputs: [ { Name: __mye56070_13, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56070, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.1/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_10, LayerType: kgen, Inputs: [ { Name: __mye56070, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56070, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56070, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54309, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/mha/attn/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/Add]
Name: /encoder/layers_1/mha/output/dense/MatMul_myl0_11, LayerType: gemm, Inputs: [ { Name: __mye54309, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_16, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.1/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_12, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_16, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56070_13, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_17, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/mha/output/Add][ONNX Layer: /encoder/layers.1/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_13, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_17, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_1/pos_ff/output/Add_output_0'.1_18, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.1/pos_ff/Mul][ONNX Layer: /encoder/layers.1/pos_ff/Mul_1][ONNX Layer: /encoder/layers.1/pos_ff/Add][ONNX Layer: /encoder/layers.1/pos_ff/Div][ONNX Layer: /encoder/layers.1/pos_ff/Erf][ONNX Layer: /encoder/layers.1/pos_ff/dense/Add]
Name: /encoder/layers_1/pos_ff/output/MatMul_myl0_14, LayerType: gemm, Inputs: [ { Name: /encoder/layers_1/pos_ff/output/Add_output_0'.1_18, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_19, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.1/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_15, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_19, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_17, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56053_20, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/pos_ff/Add_1][ONNX Layer: /encoder/layers.1/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_2/mha/attn/mha_prob/key/MatMul+/encoder/layers_2/mha/attn/mha_prob/query/MatMul+/encoder/layers_2/mha/attn/value/MatMul_myl0_16, LayerType: gemm, Inputs: [ { Name: __mye56053_20, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56053, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.2/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_17, LayerType: kgen, Inputs: [ { Name: __mye56053, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56053, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56053, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54417, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/mha/attn/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/Softmax]
Name: /encoder/layers_2/mha/output/dense/MatMul_myl0_18, LayerType: gemm, Inputs: [ { Name: __mye54417, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_23, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.2/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_19, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_23, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56053_20, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_24, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/mha/output/Add][ONNX Layer: /encoder/layers.2/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_20, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_24, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_2/pos_ff/output/Add_output_0'.1_25, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.2/pos_ff/Mul][ONNX Layer: /encoder/layers.2/pos_ff/Mul_1][ONNX Layer: /encoder/layers.2/pos_ff/Add][ONNX Layer: /encoder/layers.2/pos_ff/Div][ONNX Layer: /encoder/layers.2/pos_ff/Erf][ONNX Layer: /encoder/layers.2/pos_ff/dense/Add]
Name: /encoder/layers_2/pos_ff/output/MatMul_myl0_21, LayerType: gemm, Inputs: [ { Name: /encoder/layers_2/pos_ff/output/Add_output_0'.1_25, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_26, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.2/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_22, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_26, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_24, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56036_27, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/pos_ff/Add_1][ONNX Layer: /encoder/layers.2/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_3/mha/attn/mha_prob/key/MatMul+/encoder/layers_3/mha/attn/mha_prob/query/MatMul+/encoder/layers_3/mha/attn/value/MatMul_myl0_23, LayerType: gemm, Inputs: [ { Name: __mye56036_27, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56036, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.3/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.3/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_24, LayerType: kgen, Inputs: [ { Name: __mye56036, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56036, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56036, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54525, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/mha/attn/MatMul][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_3/mha/output/dense/MatMul_myl0_25, LayerType: gemm, Inputs: [ { Name: __mye54525, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_30, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.3/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_26, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_30, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56036_27, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_31, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/mha/output/Add][ONNX Layer: /encoder/layers.3/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_27, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_31, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_3/pos_ff/output/Add_output_0'.1_32, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.3/pos_ff/Mul][ONNX Layer: /encoder/layers.3/pos_ff/Mul_1][ONNX Layer: /encoder/layers.3/pos_ff/Add][ONNX Layer: /encoder/layers.3/pos_ff/Div][ONNX Layer: /encoder/layers.3/pos_ff/Erf][ONNX Layer: /encoder/layers.3/pos_ff/dense/Add]
Name: /encoder/layers_3/pos_ff/output/MatMul_myl0_28, LayerType: gemm, Inputs: [ { Name: /encoder/layers_3/pos_ff/output/Add_output_0'.1_32, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_33, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.3/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_29, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_33, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_31, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56019_34, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/pos_ff/Add_1][ONNX Layer: /encoder/layers.3/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_4/mha/attn/mha_prob/key/MatMul+/encoder/layers_4/mha/attn/mha_prob/query/MatMul+/encoder/layers_4/mha/attn/value/MatMul_myl0_30, LayerType: gemm, Inputs: [ { Name: __mye56019_34, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56019, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.4/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.4/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_31, LayerType: kgen, Inputs: [ { Name: __mye56019, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56019, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56019, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54633, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/mha/attn/MatMul][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_4/mha/output/dense/MatMul_myl0_32, LayerType: gemm, Inputs: [ { Name: __mye54633, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_37, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.4/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_33, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_37, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56019_34, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_38, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/mha/output/Add][ONNX Layer: /encoder/layers.4/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_34, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_38, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_4/pos_ff/output/Add_output_0'.1_39, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.4/pos_ff/Mul][ONNX Layer: /encoder/layers.4/pos_ff/Mul_1][ONNX Layer: /encoder/layers.4/pos_ff/Add][ONNX Layer: /encoder/layers.4/pos_ff/Div][ONNX Layer: /encoder/layers.4/pos_ff/Erf][ONNX Layer: /encoder/layers.4/pos_ff/dense/Add]
Name: /encoder/layers_4/pos_ff/output/MatMul_myl0_35, LayerType: gemm, Inputs: [ { Name: /encoder/layers_4/pos_ff/output/Add_output_0'.1_39, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_40, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.4/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_36, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_40, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_38, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56002_41, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/pos_ff/Add_1][ONNX Layer: /encoder/layers.4/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_5/mha/attn/mha_prob/key/MatMul+/encoder/layers_5/mha/attn/mha_prob/query/MatMul+/encoder/layers_5/mha/attn/value/MatMul_myl0_37, LayerType: gemm, Inputs: [ { Name: __mye56002_41, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56002, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.5/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_38, LayerType: kgen, Inputs: [ { Name: __mye56002, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56002, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56002, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54741, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/mha/attn/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/Div]
Name: /encoder/layers_5/mha/output/dense/MatMul_myl0_39, LayerType: gemm, Inputs: [ { Name: __mye54741, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_44, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.5/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_40, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_44, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56002_41, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_45, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/mha/output/Add][ONNX Layer: /encoder/layers.5/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_41, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_45, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_5/pos_ff/output/Add_output_0'.1_46, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.5/pos_ff/Mul][ONNX Layer: /encoder/layers.5/pos_ff/Mul_1][ONNX Layer: /encoder/layers.5/pos_ff/Add][ONNX Layer: /encoder/layers.5/pos_ff/Div][ONNX Layer: /encoder/layers.5/pos_ff/Erf][ONNX Layer: /encoder/layers.5/pos_ff/dense/Add]
Name: /encoder/layers_5/pos_ff/output/MatMul_myl0_42, LayerType: gemm, Inputs: [ { Name: /encoder/layers_5/pos_ff/output/Add_output_0'.1_46, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_47, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.5/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_43, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_47, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_45, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55985_48, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/pos_ff/Add_1][ONNX Layer: /encoder/layers.5/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_6/mha/attn/mha_prob/key/MatMul+/encoder/layers_6/mha/attn/mha_prob/query/MatMul+/encoder/layers_6/mha/attn/value/MatMul_myl0_44, LayerType: gemm, Inputs: [ { Name: __mye55985_48, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55985, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.6/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.6/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_45, LayerType: kgen, Inputs: [ { Name: __mye55985, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55985, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55985, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54849, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/mha/attn/MatMul][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_6/mha/output/dense/MatMul_myl0_46, LayerType: gemm, Inputs: [ { Name: __mye54849, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_51, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.6/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_47, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_51, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55985_48, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_52, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/mha/output/Add][ONNX Layer: /encoder/layers.6/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_48, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_52, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_6/pos_ff/output/Add_output_0'.1_53, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.6/pos_ff/Mul][ONNX Layer: /encoder/layers.6/pos_ff/Mul_1][ONNX Layer: /encoder/layers.6/pos_ff/Add][ONNX Layer: /encoder/layers.6/pos_ff/Div][ONNX Layer: /encoder/layers.6/pos_ff/Erf][ONNX Layer: /encoder/layers.6/pos_ff/dense/Add]
Name: /encoder/layers_6/pos_ff/output/MatMul_myl0_49, LayerType: gemm, Inputs: [ { Name: /encoder/layers_6/pos_ff/output/Add_output_0'.1_53, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_54, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.6/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_50, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_54, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_52, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55968_55, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/pos_ff/Add_1][ONNX Layer: /encoder/layers.6/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_7/mha/attn/mha_prob/key/MatMul+/encoder/layers_7/mha/attn/mha_prob/query/MatMul+/encoder/layers_7/mha/attn/value/MatMul_myl0_51, LayerType: gemm, Inputs: [ { Name: __mye55968_55, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55968, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.7/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_52, LayerType: kgen, Inputs: [ { Name: __mye55968, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55968, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55968, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54957, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/mha/attn/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/Div]
Name: /encoder/layers_7/mha/output/dense/MatMul_myl0_53, LayerType: gemm, Inputs: [ { Name: __mye54957, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_58, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.7/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_54, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_58, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55968_55, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_59, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/mha/output/Add][ONNX Layer: /encoder/layers.7/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_55, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_59, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_7/pos_ff/output/Add_output_0'.1_60, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.7/pos_ff/Mul][ONNX Layer: /encoder/layers.7/pos_ff/Mul_1][ONNX Layer: /encoder/layers.7/pos_ff/Add][ONNX Layer: /encoder/layers.7/pos_ff/Div][ONNX Layer: /encoder/layers.7/pos_ff/Erf][ONNX Layer: /encoder/layers.7/pos_ff/dense/Add]
Name: /encoder/layers_7/pos_ff/output/MatMul_myl0_56, LayerType: gemm, Inputs: [ { Name: /encoder/layers_7/pos_ff/output/Add_output_0'.1_60, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_61, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.7/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_57, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_61, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_59, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55951_62, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/pos_ff/Add_1][ONNX Layer: /encoder/layers.7/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_8/mha/attn/mha_prob/key/MatMul+/encoder/layers_8/mha/attn/mha_prob/query/MatMul+/encoder/layers_8/mha/attn/value/MatMul_myl0_58, LayerType: gemm, Inputs: [ { Name: __mye55951_62, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55951, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.8/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.8/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_59, LayerType: kgen, Inputs: [ { Name: __mye55951, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55951, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55951, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye55065, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/mha/attn/MatMul][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_8/mha/output/dense/MatMul_myl0_60, LayerType: gemm, Inputs: [ { Name: __mye55065, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_65, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.8/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_61, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_65, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55951_62, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_66, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/mha/output/Add][ONNX Layer: /encoder/layers.8/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_62, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_66, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_8/pos_ff/output/Add_output_0'.1_67, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.8/pos_ff/Mul][ONNX Layer: /encoder/layers.8/pos_ff/Mul_1][ONNX Layer: /encoder/layers.8/pos_ff/Add][ONNX Layer: /encoder/layers.8/pos_ff/Div][ONNX Layer: /encoder/layers.8/pos_ff/Erf][ONNX Layer: /encoder/layers.8/pos_ff/dense/Add]
Name: /encoder/layers_8/pos_ff/output/MatMul_myl0_63, LayerType: gemm, Inputs: [ { Name: /encoder/layers_8/pos_ff/output/Add_output_0'.1_67, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_68, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.8/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_64, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_68, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_66, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55934_69, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/pos_ff/Add_1][ONNX Layer: /encoder/layers.8/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_9/mha/attn/mha_prob/key/MatMul+/encoder/layers_9/mha/attn/mha_prob/query/MatMul+/encoder/layers_9/mha/attn/value/MatMul_myl0_65, LayerType: gemm, Inputs: [ { Name: __mye55934_69, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55934, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.9/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.9/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_66, LayerType: kgen, Inputs: [ { Name: __mye55934, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55934, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55934, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye55173, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/mha/attn/MatMul][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_9/mha/output/dense/MatMul_myl0_67, LayerType: gemm, Inputs: [ { Name: __mye55173, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_72, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.9/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_68, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_72, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55934_69, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_73, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/mha/output/Add][ONNX Layer: /encoder/layers.9/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_69, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_73, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_9/pos_ff/output/Add_output_0'.1_74, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.9/pos_ff/Mul][ONNX Layer: /encoder/layers.9/pos_ff/Mul_1][ONNX Layer: /encoder/layers.9/pos_ff/Add][ONNX Layer: /encoder/layers.9/pos_ff/Div][ONNX Layer: /encoder/layers.9/pos_ff/Erf][ONNX Layer: /encoder/layers.9/pos_ff/dense/Add]
Name: /encoder/layers_9/pos_ff/output/MatMul_myl0_70, LayerType: gemm, Inputs: [ { Name: /encoder/layers_9/pos_ff/output/Add_output_0'.1_74, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_75, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.9/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_71, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_75, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_73, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55917_76, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/pos_ff/Add_1][ONNX Layer: /encoder/layers.9/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_10/mha/attn/mha_prob/key/MatMul+/encoder/layers_10/mha/attn/mha_prob/query/MatMul+/encoder/layers_10/mha/attn/value/MatMul_myl0_72, LayerType: gemm, Inputs: [ { Name: __mye55917_76, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55917, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.10/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.10/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_73, LayerType: kgen, Inputs: [ { Name: __mye55917, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55917, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55917, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye55281, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/mha/attn/MatMul][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_10/mha/output/dense/MatMul_myl0_74, LayerType: gemm, Inputs: [ { Name: __mye55281, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_79, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.10/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_75, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_79, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55917_76, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_80, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/mha/output/Add][ONNX Layer: /encoder/layers.10/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_76, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_80, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_10/pos_ff/output/Add_output_0'.1_81, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.10/pos_ff/Mul][ONNX Layer: /encoder/layers.10/pos_ff/Mul_1][ONNX Layer: /encoder/layers.10/pos_ff/Add][ONNX Layer: /encoder/layers.10/pos_ff/Div][ONNX Layer: /encoder/layers.10/pos_ff/Erf][ONNX Layer: /encoder/layers.10/pos_ff/dense/Add]
Name: /encoder/layers_10/pos_ff/output/MatMul_myl0_77, LayerType: gemm, Inputs: [ { Name: /encoder/layers_10/pos_ff/output/Add_output_0'.1_81, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_82, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.10/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_78, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_82, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_80, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55900_83, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/pos_ff/Add_1][ONNX Layer: /encoder/layers.10/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_11/mha/attn/mha_prob/key/MatMul+/encoder/layers_11/mha/attn/mha_prob/query/MatMul+/encoder/layers_11/mha/attn/value/MatMul_myl0_79, LayerType: gemm, Inputs: [ { Name: __mye55900_83, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55900, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.11/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_80, LayerType: kgen, Inputs: [ { Name: __mye55900, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55900, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55900, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye55389, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/mha/attn/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/Add]
Name: /encoder/layers_11/mha/output/dense/MatMul_myl0_81, LayerType: gemm, Inputs: [ { Name: __mye55389, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_86, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.11/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_82, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_86, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55900_83, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_87, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/mha/output/Add][ONNX Layer: /encoder/layers.11/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_83, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_87, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_11/pos_ff/output/Add_output_0'.1_88, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.11/pos_ff/Mul][ONNX Layer: /encoder/layers.11/pos_ff/Mul_1][ONNX Layer: /encoder/layers.11/pos_ff/Add][ONNX Layer: /encoder/layers.11/pos_ff/Div][ONNX Layer: /encoder/layers.11/pos_ff/Erf][ONNX Layer: /encoder/layers.11/pos_ff/dense/Add]
Name: /encoder/layers_11/pos_ff/output/MatMul_myl0_84, LayerType: gemm, Inputs: [ { Name: /encoder/layers_11/pos_ff/output/Add_output_0'.1_88, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_89, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.11/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_85, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_89, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_87, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_11/pos_ff/layernorm/LayerNormalization_normalizationBiased.1, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0x68c5f8d0f932fa877bc257a0b360f096, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/pos_ff/Add_1][ONNX Layer: /encoder/layers.11/pos_ff/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_86, LayerType: fusion, Inputs: [ { Name: /encoder/layers_11/pos_ff/layernorm/LayerNormalization_normalizationBiased.1, Dimensions: [1,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_91, Dimensions: [1,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /match_model/mlp_layers.0/linear/Gemm][ONNX Layer: /match_model/mlp_layers.0/gelu/Mul_1][ONNX Layer: /match_model/mlp_layers.0/gelu/Mul][ONNX Layer: /match_model/mlp_layers.0/gelu/Add][ONNX Layer: /match_model/mlp_layers.0/gelu/Div][ONNX Layer: /match_model/mlp_layers.0/gelu/Erf]
Name: __myl_MulSumAddSigm_myl0_87, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_91, Dimensions: [1,256], Format/Datatype: Half }], Outputs: [ { Name: Reformatted Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, Dimensions: [1,1], Format/Datatype: Half }], TacticName: __myl_MulSumAddSigm_0x31f05c5f23a03ce0bee6ae4bd642a14b, StreamId: 0, Metadata: [ONNX Layer: /match_model/score_layer/Gemm][ONNX Layer: /match_model/Sigmoid]
Name: Reformatting CopyNode for Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, LayerType: Reformat, Inputs: [ { Name: Reformatted Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, Location: Device, Dimensions: [1], Format/Datatype: Half }], Outputs: [ { Name: score, Location: Device, Dimensions: [1], Format/Datatype: Float }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x00000000000003e8, StreamId: 0, Metadata:
You can see that the tensorrt engine has two layers if profilingVerbosity is layer_names_only . And if profilingVerbosity is detailed, tensorrt engine has more layers. I think that's what you mentioned there is only one layer in tensorrt engine before quantization. I guess maybe it is the profilingVerbosity issue. The profilingVerbosity config only influences the display result, it won't change the layers in tensorrt engine.
I also notice another issue in your screen shots. Some layer are fused in the left screenshot, but in the right screen shot, they are not. I guess the left screen shot is the result of bert model and the right one is the result of quantized bert model. The Q-DQ nodes inserted by model opt may influence the fusion in tensorrt. Could you please discuss the fusion problem in model opt issues? And it is appreciated to provide detailed reproduce steps. Thanks.
@kris1025 thanks very much!
I will check the profilingVerbosity setting
The source of this problem is that after I quantized the model, the model speed became slower, as you saw in the screen shot.
I used TensorRT-Model-Optimizer to quantized this model and the produce steps were listed here : https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues/159
@kris1025 btw, that means this model has multi layers not one layer , right?
I wonder why ? this is a bert model, a bert model can be fused into one large kernel as FasterTransformers do.