Finetune LLaVaNeXT -> ValueError: Image features and image tokens do not match
Hi everyone,
I tried running the notebook provided here for finetuning LLaVaNeXT: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LLaVa-NeXT/Fine_tune_LLaVaNeXT_on_a_custom_dataset_(with_PyTorch_Lightning).ipynb
However, during training, I encountered the following error:
ValueError: Image features and image tokens do not match: tokens: 251, features 2160
Im using transformers==4.51.3 and did not modify the notebook. I attempted to debug this by reviewing the code around the collate function, but couldn’t find the issue. Has anyone else run into this error or might have ideas on what’s going wrong?
Thanks
cc @zucchini-nlp
@benjwolff hey, can you try increasing MAX_LENGTH to 3000 tokens? In latest transformers versions we include all image tokens to max length count, and from next release we'll be raising errors when max length is too small to include all image tokens. Until then you can indicate a very large length so tokens do not get truncated
@zucchini-nlp Thanks for helping out! Increasing MAX_LENGTH to 3000 resolves the mismatch issue, but now I’m running into memory problems. With an A100 (40GB), it runs out of RAM during training.
Yeah, llava next requires huge memory. I ran the script on 80GB iirc.