tilelang
tilelang copied to clipboard
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
This pull request introduces significant refactoring and modernization of the FFI (Foreign Function Interface) and object system usage in the codebase, particularly in the layout and IR (Intermediate Representation) modules....
### Required prerequisites - [x] I have read the documentation . - [x] I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues) that this hasn't already been reported. (comment there if it has.)...
### Required prerequisites - [x] I have read the documentation . - [x] I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues) that this hasn't already been reported. (comment there if it has.)...
### Required prerequisites - [x] I have read the documentation . - [x] I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues) that this hasn't already been reported. (comment there if it has.)...
In the flash attention example, keep the max of previous scores_max and max(acc_s) in scores_max for numerical stability From [Flash Attention 2 paper](https://arxiv.org/pdf/2205.14135), Algorithm 1 $$m_i^{\text{new}} = \max(m_i, \tilde{m}_{ij})$$ ##...
### Required prerequisites - [x] I have read the documentation . - [x] I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues) that this hasn't already been reported. (comment there if it has.)...
- Added a mapping for GEMM instruction prefixes in `gemm.h`. - Renamed GEMM functions to include `mma_` prefix for clarity in `gemm_mma.h`, `gemm_sm70.h`, `gemm_sm90.h`, and `gemm_sp_sm80.h`. - Updated function signatures...
### Required prerequisites - [x] I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues) that this hasn't already been reported. (comment there if it has.) ### Motivation CIBW runs natively on macOS, allowing...
Roadmap: - [ ] Clear TODOs # SM8x - [x] bf16/fp16 - [x] customized metadata layout - [x] tf32 - [ ] precision issue due to using fp32 as tf32...
### Required prerequisites - [x] I have searched the [Issue Tracker](https://github.com/tile-ai/tilelang/issues) that this hasn't already been reported. (comment there if it has.) ### Motivation From some issues #1012 we can...