Flex Wang
Flex Wang
Is the support on the roadmap?
same problem here
because eventually it will invoke here: https://github.com/NVIDIA/FasterTransformer/tree/afdf9a9eb86f15363c0249117d166d6b45dbb371/3rdparty/trt_fused_multihead_attention
@niyunsheng but if you track the code, https://github.com/NVIDIA/FasterTransformer/tree/afdf9a9eb86f15363c0249117d166d6b45dbb371/3rdparty/trt_fused_multihead_attention will be called and the file name is **_flashattention_**
I am sending a few hundred requests within one batch.