Have you considered TokenMix in hidden layers?
In addition to TokenMix before the first Transformer block, have you considered or tried TokenMix in the middle of the model?
I did try it. If I understand correctly, it is close to Manifold Mixup. I believe this will be an interesting extension of CutMix/TokenMix (etc) in the feature space. Have you tried CutMix in the feature space?
Yep, I am indeed effectively talking about Manfold Mixup for TokenMix/CutMix. The Manifold Mixup paper is very interesting and thank you so much for bringing it up. I haven't tried the hidden layer version of CutMix/TokenMix. If you have tried hidden-layer TokenMix, could you share the results?
OK.
Great! Do you want to share the results on this issue or somewhere else?
@jihaonew Did you get any results?