composable_kernel
composable_kernel copied to clipboard
flash_attention forward train
Added LSE storing into flash attention forward path. Added device random number generator philox. Based on philox, added blockwise dropout. And dropout is applied into flash attention forward path. Flash attention forward training path is finished.