Support local mask
Summary: This diff introduces changes to support local masks in the decode attn implementation. The changes include adding window_left and window_right parameters to the decode function, modifying the GenRunner class to include a Mask template parameter, and modifying the collective_builder to include a Mask parameter. The changes also include modifying the load_cpasync_warpspecialized class to include window_size_left and window_size_right parameters.
Currrently, softmax is applied in a 3-loop setting. Next: Optimize these iteration and benchmark perf.
Differential Revision: D84778050
@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84778050.
Deploy Preview for pytorch-fbgemm-docs ready!
| Name | Link |
|---|---|
| Latest commit | 06d64245676181ca7588b385db2098ae98bf9a26 |
| Latest deploy log | https://app.netlify.com/projects/pytorch-fbgemm-docs/deploys/68f123b8248d2900089c65e7 |
| Deploy Preview | https://deploy-preview-5015--pytorch-fbgemm-docs.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify project configuration.