Transformers-Tutorials icon indicating copy to clipboard operation
Transformers-Tutorials copied to clipboard

Finetune LLaVaNeXT -> ValueError: Image features and image tokens do not match

Open benjwolff opened this issue 9 months ago • 4 comments

Hi everyone,

I tried running the notebook provided here for finetuning LLaVaNeXT: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa-NeXT/Fine_tune_LLaVaNeXT_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb

However, during training, I encountered the following error: ValueError: Image features and image tokens do not match: tokens: 251, features 2160

Im using transformers==4.51.3 and did not modify the notebook. I attempted to debug this by reviewing the code around the collate function, but couldn’t find the issue. Has anyone else run into this error or might have ideas on what’s going wrong?

Thanks

benjwolff avatar May 03 '25 15:05 benjwolff

cc @zucchini-nlp

NielsRogge avatar May 03 '25 17:05 NielsRogge

@benjwolff hey, can you try increasing MAX_LENGTH to 3000 tokens? In latest transformers versions we include all image tokens to max length count, and from next release we'll be raising errors when max length is too small to include all image tokens. Until then you can indicate a very large length so tokens do not get truncated

zucchini-nlp avatar May 06 '25 11:05 zucchini-nlp

@zucchini-nlp Thanks for helping out! Increasing MAX_LENGTH to 3000 resolves the mismatch issue, but now I’m running into memory problems. With an A100 (40GB), it runs out of RAM during training.

benjwolff avatar May 06 '25 12:05 benjwolff

Yeah, llava next requires huge memory. I ran the script on 80GB iirc.

zucchini-nlp avatar May 06 '25 13:05 zucchini-nlp