axlearn
axlearn copied to clipboard
Sliding window local attention
- Added sliding window local attention feature to MultiheadAttention and FlashAttention.
- Added or_masks and and_masks which would be useful to composite different mask functions.
- Fixed a bug that causes TPU decoding always to fail.