VideoMAEv2
VideoMAEv2 copied to clipboard
Finetuning with more than 16 frames
I pretrained a vit_small_patch16_224 model and want to finetune it using more frames. I receive this error when using 32 frames, when loading the checkpoint.
pos_tokens = pos_tokens.reshape(-1, T, P, P, C) RuntimeError: shape '[-1, 8, 19, 19, 384]' is invalid for input of size 1204224