When would you support USE_FLASH_ATTENTION compile?

Open xbcReal opened this issue 2 years ago • 0 comments

🚀 The feature, motivation and pitch

Hi, I wanna a faster transformer implemention in pytorch, and I found one in pytorch code ,whose path is pytorch/aten/src/Aten/native/transformers/cuda/, and it needs support USE_FLASH_ATTENTION compile. Furtherly I found some asm ptx code in utils.h, and amd-pytorch doesn't support it until now. So do you have any plan to support this feature?

Jul 07 '23 07:07 xbcReal