Marc Seger
Marc Seger
@yojayc This is for discarding the lowest attention weights, flat is generated as a view into the `attention_heads_fused`, therefore modifying flat in line 27, results in modifying `attention_heads_fused`, you can...
Hi @feiyangsuo I was also coming across the positional embedding usage, they also use relative positional bias: [https://github.com/microsoft/Swin-Transformer/blob/eed077f68e0386e8cdff2e1981492699d9c190c0/models/swin_transformer.py#L89](https://github.com/microsoft/Swin-Transformer/blob/eed077f68e0386e8cdff2e1981492699d9c190c0/models/swin_transformer.py#L89) Which is a learnable matrix of the size of a window, that...