New attention masking layers

Open soran-ghaderi opened this issue 2 years ago • 0 comments

Description: We need to implement several new attention masking layers in our model to improve its performance on specific tasks. The following masking layers need to be implemented:

[ ] Global
[ ] Block local
[ ] Band
[ ] #87
[ ] Random
[ ] Compound
[ ] axial

It is important to carefully consider the design and implementation of these masking layers to ensure they are effective and efficient.

Deadline for each layer: 2 weeks after opening the issue. After the deadline, the issue opened will be closed to make it available for other contributors.

Apr 20 '23 13:04 soran-ghaderi