FBGEMM icon indicating copy to clipboard operation
FBGEMM copied to clipboard

Support bf16 in blackwell cutlass decode attention kernel

Open Aya-ZIbra opened this issue 5 months ago • 3 comments

Summary:

  1. Reduce pipeline stages to avoid exceeding smem limit
  2. Add static_assert to make sure smem capacity violation is raised during compilation rather than runtime
  3. Select the TMEM intrinsics based on sizeof(Element).
  4. Update unittest to include bf16
  5. Also label decode kernel test name with their corresponding test parameters.

Differential Revision: D82991495

Aya-ZIbra avatar Sep 23 '25 02:09 Aya-ZIbra

Deploy Preview for pytorch-fbgemm-docs ready!

Name Link
Latest commit 0887844f15928ef8facb7fe688ba0918106446b7
Latest deploy log https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68d46815002a910008efbd3d
Deploy Preview https://deploy-preview-4916--pytorch-fbgemm-docs.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Sep 23 '25 02:09 netlify[bot]

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating diff in D82991495.

facebook-github-bot avatar Sep 23 '25 02:09 facebook-github-bot

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating diff in D82991495.

facebook-github-bot avatar Sep 24 '25 21:09 facebook-github-bot