TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

bert model split into many layers after int8 quantization

Open DamonsJ opened this issue 10 months ago • 13 comments

I first post the issue in https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues/159

I quantize a pytorch bert model using TensorRT-Model-Optimizer

before quantization, I export this model to tensorrt and there is only one layer

Image

but after quantization there are many layers, why? can this be fixed?

Image

(only part of these layers)

DamonsJ avatar Mar 24 '25 02:03 DamonsJ

I export this model to tensorrt and there is only one layer

What is cmd you used ?

lix19937 avatar Mar 24 '25 05:03 lix19937

first I export torch to onnx using :

torch.onnx.export

then I export onnx to trt engine using python trt

network_from_onnx_path

engine_from_network

DamonsJ avatar Mar 24 '25 06:03 DamonsJ

then I export onnx to trt engine using python trt

what flag config.set_flag() you set ?

lix19937 avatar Mar 24 '25 06:03 lix19937

@lix19937

self.config.set_flag(trt.BuilderFlag.FP16)
self.config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)

DamonsJ avatar Mar 24 '25 07:03 DamonsJ

Because FP16 (not int8 quant), mylin compiler the some shape-ops as one node.

lix19937 avatar Mar 24 '25 11:03 lix19937

so why it can not compile int8 quant model as one node

DamonsJ avatar Mar 24 '25 14:03 DamonsJ

There are 2 situations.

  • If you use float32 onnx, fp16 flag, or fp16+int8 flag, it is still a myelin node.
  • If you use moq.quantize for int8 ptq, the onnx structure has actually changed and is not merged into one layer.

lix19937 avatar Mar 25 '25 01:03 lix19937

ok I used moq.quantize for int8 ptq is there any way to merged the quantized onnx into one layer?

DamonsJ avatar Mar 25 '25 08:03 DamonsJ

@DamonsJ Can you help to upload the ONNX files before and after the quantization so that we can further investigate the multi-layer splitting and the performance diff in the original issue?

longlee0622 avatar Apr 10 '25 03:04 longlee0622

@longlee0622

bert-onnx.zip

here is the onnx

please help to check!

thanks!

DamonsJ avatar Apr 14 '25 04:04 DamonsJ

Test Environment: gpu: a100 tensorrt version: 10.10 cuda: 12.8 os: ubuntu22.04

Analysis:

The profilingVerbosity config will influence the display info of trt engine layer info.

Here is the result of profilingVerbosity = layer_names_only

[04/18/2025-03:52:06] [V] [TRT] Engine Layer Information:
Layer(Myelin): {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, Tactic: 0x0000000000000000, concat_ids (Int32[1,40]), concat_mask (Bool[1,40]), sent_ids (Int32[1,40]), pos_ids (Int32[1,40]) -> Reformatted Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]} (Half[1])
Layer(Reformat): Reformatting CopyNode for Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, Tactic: 0x0000000000000000, Reformatted Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]} (Half[1]) -> score (Float[1])

Here is the result of profilingVerbosity = detailed

[04/18/2025-03:51:19] [V] [TRT] Layers:
Name: __myl_GathGathGathMulAddMulAddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_0, LayerType: kgen, Inputs: [ { Name: pos_ids, Dimensions: [1,40], Format/Datatype: Int32 }, { Name: sent_ids, Dimensions: [1,40], Format/Datatype: Int32 }, { Name: concat_ids, Dimensions: [1,40], Format/Datatype: Int32 }], Outputs: [ { Name: __mye56087_5, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_GathGathGathMulAddMulAddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xd6fb31e7506899c9795a4712e7e9a194, StreamId: 0, Metadata: [ONNX Layer: /encoder/embedding/Gather][ONNX Layer: /encoder/Mul][ONNX Layer: /encoder/Add][ONNX Layer: /encoder/sent_embedding/Gather][ONNX Layer: /encoder/Mul_1][ONNX Layer: /encoder/pos_encoding/Add][ONNX Layer: /encoder/pos_encoding/pe/Gather][ONNX Layer: /encoder/pos_encoding/layernorm/LayerNormalization]
Name: /encoder/layers_0/mha/attn/mha_prob/key/MatMul+/encoder/layers_0/mha/attn/mha_prob/query/MatMul+/encoder/layers_0/mha/attn/value/MatMul_myl0_1, LayerType: gemm, Inputs: [ { Name: __mye56087_5, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56087, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.0/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.0/mha/attn/value/Add]
Name: __myl_ReplReshReplAndNotCastMul_myl0_2, LayerType: kgen, Inputs: [ { Name: concat_mask, Dimensions: [1,1,1,40], Format/Datatype: Bool }], Outputs: [ { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }], TacticName: __myl_ReplReshReplAndNotCastMul_0x28d2b74213c12bac17b6e487f910f07d, StreamId: 0, Metadata: [ONNX Layer: /encoder/And][ONNX Layer: /encoder/Cast_1][ONNX Layer: /encoder/Not][ONNX Layer: /encoder/Mul_2][ONNX Layer: /encoder/Unsqueeze_5][ONNX Layer: /encoder/Cast]
Name: _gemm_mha_v2_myl0_3, LayerType: kgen, Inputs: [ { Name: __mye56087, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56087, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56087, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54201, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/mha/attn/MatMul][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.0/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_0/mha/output/dense/MatMul_myl0_4, LayerType: gemm, Inputs: [ { Name: __mye54201, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_9, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.0/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_5, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_9, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56087_5, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_10, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/mha/output/Add][ONNX Layer: /encoder/layers.0/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_6, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_10, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_0/pos_ff/output/Add_output_0'.1_11, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.0/pos_ff/Mul][ONNX Layer: /encoder/layers.0/pos_ff/Mul_1][ONNX Layer: /encoder/layers.0/pos_ff/Add][ONNX Layer: /encoder/layers.0/pos_ff/Div][ONNX Layer: /encoder/layers.0/pos_ff/Erf][ONNX Layer: /encoder/layers.0/pos_ff/dense/Add]
Name: /encoder/layers_0/pos_ff/output/MatMul_myl0_7, LayerType: gemm, Inputs: [ { Name: /encoder/layers_0/pos_ff/output/Add_output_0'.1_11, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_12, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.0/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_8, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_12, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_10, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56070_13, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.0/pos_ff/Add_1][ONNX Layer: /encoder/layers.0/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_1/mha/attn/mha_prob/key/MatMul+/encoder/layers_1/mha/attn/mha_prob/query/MatMul+/encoder/layers_1/mha/attn/value/MatMul_myl0_9, LayerType: gemm, Inputs: [ { Name: __mye56070_13, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56070, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.1/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_10, LayerType: kgen, Inputs: [ { Name: __mye56070, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56070, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56070, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54309, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/mha/attn/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.1/mha/attn/mha_prob/Add]
Name: /encoder/layers_1/mha/output/dense/MatMul_myl0_11, LayerType: gemm, Inputs: [ { Name: __mye54309, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_16, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.1/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_12, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_16, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56070_13, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_17, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/mha/output/Add][ONNX Layer: /encoder/layers.1/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_13, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_17, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_1/pos_ff/output/Add_output_0'.1_18, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.1/pos_ff/Mul][ONNX Layer: /encoder/layers.1/pos_ff/Mul_1][ONNX Layer: /encoder/layers.1/pos_ff/Add][ONNX Layer: /encoder/layers.1/pos_ff/Div][ONNX Layer: /encoder/layers.1/pos_ff/Erf][ONNX Layer: /encoder/layers.1/pos_ff/dense/Add]
Name: /encoder/layers_1/pos_ff/output/MatMul_myl0_14, LayerType: gemm, Inputs: [ { Name: /encoder/layers_1/pos_ff/output/Add_output_0'.1_18, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_19, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.1/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_15, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_19, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_17, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56053_20, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.1/pos_ff/Add_1][ONNX Layer: /encoder/layers.1/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_2/mha/attn/mha_prob/key/MatMul+/encoder/layers_2/mha/attn/mha_prob/query/MatMul+/encoder/layers_2/mha/attn/value/MatMul_myl0_16, LayerType: gemm, Inputs: [ { Name: __mye56053_20, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56053, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.2/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_17, LayerType: kgen, Inputs: [ { Name: __mye56053, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56053, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56053, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54417, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/mha/attn/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.2/mha/attn/mha_prob/Softmax]
Name: /encoder/layers_2/mha/output/dense/MatMul_myl0_18, LayerType: gemm, Inputs: [ { Name: __mye54417, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_23, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.2/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_19, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_23, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56053_20, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_24, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/mha/output/Add][ONNX Layer: /encoder/layers.2/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_20, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_24, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_2/pos_ff/output/Add_output_0'.1_25, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.2/pos_ff/Mul][ONNX Layer: /encoder/layers.2/pos_ff/Mul_1][ONNX Layer: /encoder/layers.2/pos_ff/Add][ONNX Layer: /encoder/layers.2/pos_ff/Div][ONNX Layer: /encoder/layers.2/pos_ff/Erf][ONNX Layer: /encoder/layers.2/pos_ff/dense/Add]
Name: /encoder/layers_2/pos_ff/output/MatMul_myl0_21, LayerType: gemm, Inputs: [ { Name: /encoder/layers_2/pos_ff/output/Add_output_0'.1_25, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_26, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.2/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_22, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_26, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_24, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56036_27, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.2/pos_ff/Add_1][ONNX Layer: /encoder/layers.2/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_3/mha/attn/mha_prob/key/MatMul+/encoder/layers_3/mha/attn/mha_prob/query/MatMul+/encoder/layers_3/mha/attn/value/MatMul_myl0_23, LayerType: gemm, Inputs: [ { Name: __mye56036_27, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56036, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.3/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.3/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_24, LayerType: kgen, Inputs: [ { Name: __mye56036, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56036, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56036, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54525, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/mha/attn/MatMul][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.3/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_3/mha/output/dense/MatMul_myl0_25, LayerType: gemm, Inputs: [ { Name: __mye54525, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_30, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.3/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_26, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_30, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56036_27, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_31, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/mha/output/Add][ONNX Layer: /encoder/layers.3/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_27, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_31, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_3/pos_ff/output/Add_output_0'.1_32, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.3/pos_ff/Mul][ONNX Layer: /encoder/layers.3/pos_ff/Mul_1][ONNX Layer: /encoder/layers.3/pos_ff/Add][ONNX Layer: /encoder/layers.3/pos_ff/Div][ONNX Layer: /encoder/layers.3/pos_ff/Erf][ONNX Layer: /encoder/layers.3/pos_ff/dense/Add]
Name: /encoder/layers_3/pos_ff/output/MatMul_myl0_28, LayerType: gemm, Inputs: [ { Name: /encoder/layers_3/pos_ff/output/Add_output_0'.1_32, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_33, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.3/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_29, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_33, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_31, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56019_34, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.3/pos_ff/Add_1][ONNX Layer: /encoder/layers.3/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_4/mha/attn/mha_prob/key/MatMul+/encoder/layers_4/mha/attn/mha_prob/query/MatMul+/encoder/layers_4/mha/attn/value/MatMul_myl0_30, LayerType: gemm, Inputs: [ { Name: __mye56019_34, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56019, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.4/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.4/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_31, LayerType: kgen, Inputs: [ { Name: __mye56019, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56019, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56019, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54633, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/mha/attn/MatMul][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.4/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_4/mha/output/dense/MatMul_myl0_32, LayerType: gemm, Inputs: [ { Name: __mye54633, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_37, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.4/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_33, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_37, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56019_34, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_38, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/mha/output/Add][ONNX Layer: /encoder/layers.4/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_34, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_38, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_4/pos_ff/output/Add_output_0'.1_39, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.4/pos_ff/Mul][ONNX Layer: /encoder/layers.4/pos_ff/Mul_1][ONNX Layer: /encoder/layers.4/pos_ff/Add][ONNX Layer: /encoder/layers.4/pos_ff/Div][ONNX Layer: /encoder/layers.4/pos_ff/Erf][ONNX Layer: /encoder/layers.4/pos_ff/dense/Add]
Name: /encoder/layers_4/pos_ff/output/MatMul_myl0_35, LayerType: gemm, Inputs: [ { Name: /encoder/layers_4/pos_ff/output/Add_output_0'.1_39, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_40, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.4/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_36, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_40, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_38, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56002_41, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.4/pos_ff/Add_1][ONNX Layer: /encoder/layers.4/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_5/mha/attn/mha_prob/key/MatMul+/encoder/layers_5/mha/attn/mha_prob/query/MatMul+/encoder/layers_5/mha/attn/value/MatMul_myl0_37, LayerType: gemm, Inputs: [ { Name: __mye56002_41, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye56002, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.5/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_38, LayerType: kgen, Inputs: [ { Name: __mye56002, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye56002, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye56002, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54741, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/mha/attn/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.5/mha/attn/mha_prob/Div]
Name: /encoder/layers_5/mha/output/dense/MatMul_myl0_39, LayerType: gemm, Inputs: [ { Name: __mye54741, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_44, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.5/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_40, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_44, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye56002_41, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_45, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/mha/output/Add][ONNX Layer: /encoder/layers.5/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_41, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_45, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_5/pos_ff/output/Add_output_0'.1_46, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.5/pos_ff/Mul][ONNX Layer: /encoder/layers.5/pos_ff/Mul_1][ONNX Layer: /encoder/layers.5/pos_ff/Add][ONNX Layer: /encoder/layers.5/pos_ff/Div][ONNX Layer: /encoder/layers.5/pos_ff/Erf][ONNX Layer: /encoder/layers.5/pos_ff/dense/Add]
Name: /encoder/layers_5/pos_ff/output/MatMul_myl0_42, LayerType: gemm, Inputs: [ { Name: /encoder/layers_5/pos_ff/output/Add_output_0'.1_46, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_47, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.5/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_43, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_47, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_45, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55985_48, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.5/pos_ff/Add_1][ONNX Layer: /encoder/layers.5/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_6/mha/attn/mha_prob/key/MatMul+/encoder/layers_6/mha/attn/mha_prob/query/MatMul+/encoder/layers_6/mha/attn/value/MatMul_myl0_44, LayerType: gemm, Inputs: [ { Name: __mye55985_48, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55985, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.6/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.6/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_45, LayerType: kgen, Inputs: [ { Name: __mye55985, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55985, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55985, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54849, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/mha/attn/MatMul][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.6/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_6/mha/output/dense/MatMul_myl0_46, LayerType: gemm, Inputs: [ { Name: __mye54849, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_51, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.6/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_47, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_51, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55985_48, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_52, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/mha/output/Add][ONNX Layer: /encoder/layers.6/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_48, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_52, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_6/pos_ff/output/Add_output_0'.1_53, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.6/pos_ff/Mul][ONNX Layer: /encoder/layers.6/pos_ff/Mul_1][ONNX Layer: /encoder/layers.6/pos_ff/Add][ONNX Layer: /encoder/layers.6/pos_ff/Div][ONNX Layer: /encoder/layers.6/pos_ff/Erf][ONNX Layer: /encoder/layers.6/pos_ff/dense/Add]
Name: /encoder/layers_6/pos_ff/output/MatMul_myl0_49, LayerType: gemm, Inputs: [ { Name: /encoder/layers_6/pos_ff/output/Add_output_0'.1_53, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_54, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.6/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_50, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_54, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_52, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55968_55, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.6/pos_ff/Add_1][ONNX Layer: /encoder/layers.6/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_7/mha/attn/mha_prob/key/MatMul+/encoder/layers_7/mha/attn/mha_prob/query/MatMul+/encoder/layers_7/mha/attn/value/MatMul_myl0_51, LayerType: gemm, Inputs: [ { Name: __mye55968_55, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55968, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.7/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_52, LayerType: kgen, Inputs: [ { Name: __mye55968, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55968, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55968, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye54957, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/mha/attn/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.7/mha/attn/mha_prob/Div]
Name: /encoder/layers_7/mha/output/dense/MatMul_myl0_53, LayerType: gemm, Inputs: [ { Name: __mye54957, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_58, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.7/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_54, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_58, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55968_55, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_59, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/mha/output/Add][ONNX Layer: /encoder/layers.7/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_55, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_59, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_7/pos_ff/output/Add_output_0'.1_60, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.7/pos_ff/Mul][ONNX Layer: /encoder/layers.7/pos_ff/Mul_1][ONNX Layer: /encoder/layers.7/pos_ff/Add][ONNX Layer: /encoder/layers.7/pos_ff/Div][ONNX Layer: /encoder/layers.7/pos_ff/Erf][ONNX Layer: /encoder/layers.7/pos_ff/dense/Add]
Name: /encoder/layers_7/pos_ff/output/MatMul_myl0_56, LayerType: gemm, Inputs: [ { Name: /encoder/layers_7/pos_ff/output/Add_output_0'.1_60, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_61, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.7/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_57, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_61, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_59, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55951_62, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.7/pos_ff/Add_1][ONNX Layer: /encoder/layers.7/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_8/mha/attn/mha_prob/key/MatMul+/encoder/layers_8/mha/attn/mha_prob/query/MatMul+/encoder/layers_8/mha/attn/value/MatMul_myl0_58, LayerType: gemm, Inputs: [ { Name: __mye55951_62, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55951, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.8/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.8/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_59, LayerType: kgen, Inputs: [ { Name: __mye55951, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55951, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55951, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye55065, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/mha/attn/MatMul][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.8/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_8/mha/output/dense/MatMul_myl0_60, LayerType: gemm, Inputs: [ { Name: __mye55065, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_65, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.8/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_61, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_65, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55951_62, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_66, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/mha/output/Add][ONNX Layer: /encoder/layers.8/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_62, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_66, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_8/pos_ff/output/Add_output_0'.1_67, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.8/pos_ff/Mul][ONNX Layer: /encoder/layers.8/pos_ff/Mul_1][ONNX Layer: /encoder/layers.8/pos_ff/Add][ONNX Layer: /encoder/layers.8/pos_ff/Div][ONNX Layer: /encoder/layers.8/pos_ff/Erf][ONNX Layer: /encoder/layers.8/pos_ff/dense/Add]
Name: /encoder/layers_8/pos_ff/output/MatMul_myl0_63, LayerType: gemm, Inputs: [ { Name: /encoder/layers_8/pos_ff/output/Add_output_0'.1_67, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_68, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.8/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_64, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_68, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_66, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55934_69, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.8/pos_ff/Add_1][ONNX Layer: /encoder/layers.8/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_9/mha/attn/mha_prob/key/MatMul+/encoder/layers_9/mha/attn/mha_prob/query/MatMul+/encoder/layers_9/mha/attn/value/MatMul_myl0_65, LayerType: gemm, Inputs: [ { Name: __mye55934_69, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55934, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.9/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.9/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_66, LayerType: kgen, Inputs: [ { Name: __mye55934, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55934, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55934, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye55173, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/mha/attn/MatMul][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.9/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_9/mha/output/dense/MatMul_myl0_67, LayerType: gemm, Inputs: [ { Name: __mye55173, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_72, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.9/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_68, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_72, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55934_69, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_73, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/mha/output/Add][ONNX Layer: /encoder/layers.9/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_69, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_73, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_9/pos_ff/output/Add_output_0'.1_74, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.9/pos_ff/Mul][ONNX Layer: /encoder/layers.9/pos_ff/Mul_1][ONNX Layer: /encoder/layers.9/pos_ff/Add][ONNX Layer: /encoder/layers.9/pos_ff/Div][ONNX Layer: /encoder/layers.9/pos_ff/Erf][ONNX Layer: /encoder/layers.9/pos_ff/dense/Add]
Name: /encoder/layers_9/pos_ff/output/MatMul_myl0_70, LayerType: gemm, Inputs: [ { Name: /encoder/layers_9/pos_ff/output/Add_output_0'.1_74, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_75, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.9/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_71, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_75, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_73, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55917_76, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.9/pos_ff/Add_1][ONNX Layer: /encoder/layers.9/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_10/mha/attn/mha_prob/key/MatMul+/encoder/layers_10/mha/attn/mha_prob/query/MatMul+/encoder/layers_10/mha/attn/value/MatMul_myl0_72, LayerType: gemm, Inputs: [ { Name: __mye55917_76, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55917, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.10/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.10/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_73, LayerType: kgen, Inputs: [ { Name: __mye55917, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55917, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55917, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye55281, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/mha/attn/MatMul][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/Add][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.10/mha/attn/mha_prob/MatMul]
Name: /encoder/layers_10/mha/output/dense/MatMul_myl0_74, LayerType: gemm, Inputs: [ { Name: __mye55281, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_79, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.10/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_75, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_79, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55917_76, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_80, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/mha/output/Add][ONNX Layer: /encoder/layers.10/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_76, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_80, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_10/pos_ff/output/Add_output_0'.1_81, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.10/pos_ff/Mul][ONNX Layer: /encoder/layers.10/pos_ff/Mul_1][ONNX Layer: /encoder/layers.10/pos_ff/Add][ONNX Layer: /encoder/layers.10/pos_ff/Div][ONNX Layer: /encoder/layers.10/pos_ff/Erf][ONNX Layer: /encoder/layers.10/pos_ff/dense/Add]
Name: /encoder/layers_10/pos_ff/output/MatMul_myl0_77, LayerType: gemm, Inputs: [ { Name: /encoder/layers_10/pos_ff/output/Add_output_0'.1_81, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_82, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.10/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_78, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_82, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_80, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55900_83, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.10/pos_ff/Add_1][ONNX Layer: /encoder/layers.10/pos_ff/layernorm/LayerNormalization]
Name: /encoder/layers_11/mha/attn/mha_prob/key/MatMul+/encoder/layers_11/mha/attn/mha_prob/query/MatMul+/encoder/layers_11/mha/attn/value/MatMul_myl0_79, LayerType: gemm, Inputs: [ { Name: __mye55900_83, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __mye55900, Dimensions: [3,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/key/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/key/Add][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/query/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/query/Add][ONNX Layer: /encoder/layers.11/mha/attn/value/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/value/Add]
Name: _gemm_mha_v2_myl0_80, LayerType: kgen, Inputs: [ { Name: __mye55900, Dimensions: [4,40,64], Format/Datatype: Half }, { Name: __mye55900, Dimensions: [4,64,40], Format/Datatype: Half }, { Name: __mye55389_7, Dimensions: [1,1,40,40], Format/Datatype: Half }, { Name: __mye55900, Dimensions: [4,40,64], Format/Datatype: Half }], Outputs: [ { Name: __mye55389, Dimensions: [4,40,64], Format/Datatype: Half }], TacticName: _gemm_mha_v2_0x957a2fe283f77dcf35c142c1390a3e6a, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/mha/attn/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/Softmax][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/Div][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/MatMul][ONNX Layer: /encoder/layers.11/mha/attn/mha_prob/Add]
Name: /encoder/layers_11/mha/output/dense/MatMul_myl0_81, LayerType: gemm, Inputs: [ { Name: __mye55389, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_86, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f32_f32_tn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/mha/output/dense/MatMul][ONNX Layer: /encoder/layers.11/mha/output/dense/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_82, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_86, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __mye55900_83, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_87, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0xf2543c36ff9f104bbb1655b0183afb47, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/mha/output/Add][ONNX Layer: /encoder/layers.11/mha/output/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_83, LayerType: fusion, Inputs: [ { Name: __myln_k_arg__bb1_87, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_11/pos_ff/output/Add_output_0'.1_88, Dimensions: [1,40,1024], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/pos_ff/dense/MatMul][ONNX Layer: /encoder/layers.11/pos_ff/Mul][ONNX Layer: /encoder/layers.11/pos_ff/Mul_1][ONNX Layer: /encoder/layers.11/pos_ff/Add][ONNX Layer: /encoder/layers.11/pos_ff/Div][ONNX Layer: /encoder/layers.11/pos_ff/Erf][ONNX Layer: /encoder/layers.11/pos_ff/dense/Add]
Name: /encoder/layers_11/pos_ff/output/MatMul_myl0_84, LayerType: gemm, Inputs: [ { Name: /encoder/layers_11/pos_ff/output/Add_output_0'.1_88, Dimensions: [1,40,1024], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_89, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/pos_ff/output/MatMul][ONNX Layer: /encoder/layers.11/pos_ff/output/Add]
Name: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_myl0_85, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_89, Dimensions: [1,40,256], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_87, Dimensions: [1,40,256], Format/Datatype: Half }], Outputs: [ { Name: /encoder/layers_11/pos_ff/layernorm/LayerNormalization_normalizationBiased.1, Dimensions: [1,40,256], Format/Datatype: Half }], TacticName: __myl_AddCastMeanSubMulMeanAddSqrtDivMulCastMulAdd_0x68c5f8d0f932fa877bc257a0b360f096, StreamId: 0, Metadata: [ONNX Layer: /encoder/layers.11/pos_ff/Add_1][ONNX Layer: /encoder/layers.11/pos_ff/layernorm/LayerNormalization]
Name: __myl_Fc_myl0_86, LayerType: fusion, Inputs: [ { Name: /encoder/layers_11/pos_ff/layernorm/LayerNormalization_normalizationBiased.1, Dimensions: [1,256], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_91, Dimensions: [1,256], Format/Datatype: Half }], TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize32x32x64_stage6_warpsize2x2x1_tensor16x8x16_by_fusion_tactic, StreamId: 0, Metadata: [ONNX Layer: /match_model/mlp_layers.0/linear/Gemm][ONNX Layer: /match_model/mlp_layers.0/gelu/Mul_1][ONNX Layer: /match_model/mlp_layers.0/gelu/Mul][ONNX Layer: /match_model/mlp_layers.0/gelu/Add][ONNX Layer: /match_model/mlp_layers.0/gelu/Div][ONNX Layer: /match_model/mlp_layers.0/gelu/Erf]
Name: __myl_MulSumAddSigm_myl0_87, LayerType: kgen, Inputs: [ { Name: __myln_k_arg__bb1_91, Dimensions: [1,256], Format/Datatype: Half }], Outputs: [ { Name: Reformatted Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, Dimensions: [1,1], Format/Datatype: Half }], TacticName: __myl_MulSumAddSigm_0x31f05c5f23a03ce0bee6ae4bd642a14b, StreamId: 0, Metadata: [ONNX Layer: /match_model/score_layer/Gemm][ONNX Layer: /match_model/Sigmoid]
Name: Reformatting CopyNode for Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, LayerType: Reformat, Inputs: [ { Name: Reformatted Output Tensor 0 to {ForeignNode[/encoder/Unsqueeze_3 + /encoder/Unsqueeze_4.../match_model/Squeeze]}, Location: Device, Dimensions: [1], Format/Datatype: Half }], Outputs: [ { Name: score, Location: Device, Dimensions: [1], Format/Datatype: Float }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x00000000000003e8, StreamId: 0, Metadata: 

You can see that the tensorrt engine has two layers if profilingVerbosity is layer_names_only . And if profilingVerbosity is detailed, tensorrt engine has more layers. I think that's what you mentioned there is only one layer in tensorrt engine before quantization. I guess maybe it is the profilingVerbosity issue. The profilingVerbosity config only influences the display result, it won't change the layers in tensorrt engine.

I also notice another issue in your screen shots. Some layer are fused in the left screenshot, but in the right screen shot, they are not. I guess the left screen shot is the result of bert model and the right one is the result of quantized bert model. The Q-DQ nodes inserted by model opt may influence the fusion in tensorrt. Could you please discuss the fusion problem in model opt issues? And it is appreciated to provide detailed reproduce steps. Thanks.

kris1025 avatar Apr 21 '25 07:04 kris1025

@kris1025 thanks very much!

I will check the profilingVerbosity setting

The source of this problem is that after I quantized the model, the model speed became slower, as you saw in the screen shot.

I used TensorRT-Model-Optimizer to quantized this model and the produce steps were listed here : https://github.com/NVIDIA/TensorRT-Model-Optimizer/issues/159

DamonsJ avatar Apr 21 '25 08:04 DamonsJ

@kris1025 btw, that means this model has multi layers not one layer , right?

I wonder why ? this is a bert model, a bert model can be fused into one large kernel as FasterTransformers do.

DamonsJ avatar Apr 24 '25 01:04 DamonsJ