PaddleCustomDevice
PaddleCustomDevice copied to clipboard
[MLU] optimize range kernel; flash_attn kernel
- make range kernel a function, which uses cnnlArange_v2
- use range function in flash_attn
Thanks for your contribution!