Po Yen Chen
Po Yen Chen
Failed to compile due to we are using wrong K tile size for hdim=32 in the async pipleine. Shall re-open this PR after fixing the tile size issue.
we need to wait for @danyao12 merge his fmha bwd & dropout changes then refactor all the updated example codes together.
I will continue developing the **fmha fwd + KV cache reference function** base on current design of `HostTensor`.
this PR is no longer needed.
I'm fixing compilation errors
@LJ-underdog are you still working on this?
@minzhezhou Thanks for your time. We only support mi200 & mi300 at this time. Thus we put **gfx90a**/**gfx94x** in the `allowed_archs` list. Other targets should be blocked anyway..
> Hi @poyenc, thanks for the reminder. Do you mean it is technically impossible to make it work for navi or it is not on the official roadmap yet? >...
is the report PR actually #2371?
still have some fail test cases when running _smoke_test.sh_. I'm investigating the scheduling result.