TransformerX
TransformerX copied to clipboard
New attention masking layers
Description: We need to implement several new attention masking layers in our model to improve its performance on specific tasks. The following masking layers need to be implemented:
- [ ] Global
- [ ] Block local
- [ ] Band
- [ ] #87
- [ ] Random
- [ ] Compound
- [ ] axial
It is important to carefully consider the design and implementation of these masking layers to ensure they are effective and efficient.
Deadline for each layer: 2 weeks after opening the issue. After the deadline, the issue opened will be closed to make it available for other contributors.