Po Yen Chen

Results 13 comments of Po Yen Chen

Failed to compile due to we are using wrong K tile size for hdim=32 in the async pipleine. Shall re-open this PR after fixing the tile size issue.

we need to wait for @danyao12 merge his fmha bwd & dropout changes then refactor all the updated example codes together.

I will continue developing the **fmha fwd + KV cache reference function** base on current design of `HostTensor`.

this PR is no longer needed.

@LJ-underdog are you still working on this?

@minzhezhou Thanks for your time. We only support mi200 & mi300 at this time. Thus we put **gfx90a**/**gfx94x** in the `allowed_archs` list. Other targets should be blocked anyway..

> Hi @poyenc, thanks for the reminder. Do you mean it is technically impossible to make it work for navi or it is not on the official roadmap yet? >...

still have some fail test cases when running _smoke_test.sh_. I'm investigating the scheduling result.