FBGEMM
FBGEMM copied to clipboard
Support bf16 in blackwell cutlass decode attention kernel
Summary:
- Reduce pipeline stages to avoid exceeding smem limit
- Add static_assert to make sure smem capacity violation is raised during compilation rather than runtime
- Select the TMEM intrinsics based on sizeof(Element).
- Update unittest to include bf16
- Also label decode kernel test name with their corresponding test parameters.
Differential Revision: D82991495
Deploy Preview for pytorch-fbgemm-docs ready!
| Name | Link |
|---|---|
| Latest commit | 0887844f15928ef8facb7fe688ba0918106446b7 |
| Latest deploy log | https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68d46815002a910008efbd3d |
| Deploy Preview | https://deploy-preview-4916--pytorch-fbgemm-docs.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify project configuration.
@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating diff in D82991495.
@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating diff in D82991495.