Haocong WANG
Haocong WANG
This PR set "-mllvm -enable-post-misched=0" as default compile option of CK. This option will improve gemm_universal performance and ensure the correctness.
## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...
Update a8w8 kernel library Update flush cache timing api
Implement new data movement and mma layout inside universal gemm.
## Proposed changes Add TileSize kM0=64 in fmha fwd kernel, for xformer medium size shape consuming ## Checklist Please put an `x` into the boxes that apply. You can also...
## Proposed changes Enable hdim=96/160/192 instances in fmha fwd and turn on tests for them. ## Checklist Please put an `x` into the boxes that apply. You can also fill...