pytorch
pytorch copied to clipboard
When would you support USE_FLASH_ATTENTION compile?
🚀 The feature, motivation and pitch
Hi, I wanna a faster transformer implemention in pytorch, and I found one in pytorch code ,whose path is pytorch/aten/src/Aten/native/transformers/cuda/, and it needs support USE_FLASH_ATTENTION compile. Furtherly I found some asm ptx code in utils.h, and amd-pytorch doesn't support it until now. So do you have any plan to support this feature?