Aya Z. Ibrahim

Results 7 issues of Aya Z. Ibrahim

Summary: This diff reverts D57738223 D57738223: [fp8 kv cache] wmma_gqa_attn_splitk by Aya-ZIbra causes the following test failure: Tests affected: - [cogwheel:cogwheel_gpu_ait_lowering_latency_regression_test#main](https://www.internalfb.com/intern/test/281475067301657/) Here's the Multisect link: https://www.internalfb.com/multisect/5328620 Here are the tasks...

fb-exported
cla signed

Summary: Performance optimized dequantization function. Differential Revision: D57527556

fb-exported
cla signed

Summary: quick fix for now.. long term fuse conversion in kernel epilogue. Differential Revision: D89254085

fb-exported
cla signed
meta-exported

Summary: This diff adds an automatic split-K size heuristic for the Blackwell FMHA decode kernel to optimize GPU utilization. Added `get_splitk_heuristic()` that automatically computes optimal split-K size . The heuristic...

fb-exported
cla signed
meta-exported

Summary: This diff introduces changes to support local masks in the decode attn implementation. The changes include adding window_left and window_right parameters to the decode function, modifying the GenRunner class...

fb-exported
cla signed
meta-exported

Summary: X-link: https://github.com/facebookresearch/FBGEMM/pull/2017 Add stand-alone blackwell decode op. Supported mask: BlockDiagonalCausalWithOffsetPaddedKeysMask Differential Revision: D84630701

fb-exported
cla signed
meta-exported

Summary: 1. Reduce pipeline stages to avoid exceeding smem limit 2. Add static_assert to make sure smem capacity violation is raised during compilation rather than runtime 3. Select the TMEM...

fb-exported
cla signed
meta-exported