Finetuning with more than 16 frames

Open CSLR-research opened this issue 1 year ago • 0 comments

I pretrained a vit_small_patch16_224 model and want to finetune it using more frames. I receive this error when using 32 frames, when loading the checkpoint.

pos_tokens = pos_tokens.reshape(-1, T, P, P, C) RuntimeError: shape '[-1, 8, 19, 19, 384]' is invalid for input of size 1204224

May 28 '24 09:05 CSLR-research