Haocong WANG

Results 7 issues of Haocong WANG

This PR set "-mllvm -enable-post-misched=0" as default compile option of CK. This option will improve gemm_universal performance and ensure the correctness.

quality

## Proposed changes Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please...

Update a8w8 kernel library Update flush cache timing api

Implement new data movement and mma layout inside universal gemm.

enhancement

## Proposed changes Add TileSize kM0=64 in fmha fwd kernel, for xformer medium size shape consuming ## Checklist Please put an `x` into the boxes that apply. You can also...

## Proposed changes Enable hdim=96/160/192 instances in fmha fwd and turn on tests for them. ## Checklist Please put an `x` into the boxes that apply. You can also fill...