TensorRT
TensorRT copied to clipboard
Fused_MHA does not support when seq_len = 1024, Dh==72, causal_mask==false
When I use FMHA_v2, I found it does not support my scenes. So i wonder is there any way to use fmha_v2 except changing model. Thx a lot.
From https://github.com/NVIDIA/TensorRT/tree/release/10.3/plugin/bertQKVToContextPlugin#parameters
Current v2 plugin Add head size 32 support when sequence length is 128, 256 or 512 in v2 plugin, see https://github.com/NVIDIA/TensorRT/tree/release/10.3/plugin/bertQKVToContextPlugin/fused_multihead_attention_v2/src about 1024 cubin are not provided. Maybe you can use cutlass to impl your case.