TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Is there any feature related to GPT-like models that can be applied to BERT-like models?

Open zhangxin81 opened this issue 1 year ago • 3 comments

Is there any fesature related to GPT-like models that can be applied to BERT-like models?

zhangxin81 avatar Apr 29 '24 02:04 zhangxin81

They have some common optimization idea, like fusing the multi head attention kernel, quantizing the model to int8 or fp8.

byshiue avatar Apr 30 '24 03:04 byshiue

hI @byshiue, a related question. Does BertAttentionPlugin also use FlashAttention2 that GptAttention uses?

Ashwin-Ramesh2607 avatar May 16 '24 17:05 Ashwin-Ramesh2607

Yes.

byshiue avatar May 17 '24 07:05 byshiue