Swin-Transformer
Swin-Transformer copied to clipboard
How does Spatial MLP implement muti-head?
How does Spatial MLP implement muti-head? I don't understand this, isn't this just a conv1d? Why does it say: use group convolution to implement multi-head MLP?