Flex Wang

Results 5 comments of Flex Wang

Is the support on the roadmap?

because eventually it will invoke here: https://github.com/NVIDIA/FasterTransformer/tree/afdf9a9eb86f15363c0249117d166d6b45dbb371/3rdparty/trt_fused_multihead_attention

@niyunsheng but if you track the code, https://github.com/NVIDIA/FasterTransformer/tree/afdf9a9eb86f15363c0249117d166d6b45dbb371/3rdparty/trt_fused_multihead_attention will be called and the file name is **_flashattention_**

I am sending a few hundred requests within one batch.