composable_kernel
composable_kernel copied to clipboard
[CK_TILE] Add appendkv kernel to support mha with kvcache
Add new fmha_fwd_appendkv() API which runs ahead the fmha_fwd()/fmha_fwd_splitkv() API.
The fmha_fwd_appendkv() + fmha_fwd()/fmha_fwd_splitkv() combination implement the functionality of mha_fwd_kvcache() in FA 2.5 (without paged-kvcache part)
the paged-kvcache will be added to this PR.
All the test has pass (MI200 + MI300 @ ROCm6.1) in flash attention https://github.com/ROCm/flash-attention/pull/74