LLaVA-Plus-Codebase icon indicating copy to clipboard operation
LLaVA-Plus-Codebase copied to clipboard

[Question] Finetuning bash file

Open AlirezaShamsoshoara opened this issue 1 year ago • 5 comments

Question

Hey, I was wondering if anyone has run the fine-tuning with Lora for LLava-plus? I can run it with visiontower and image_data, but the results are not good. I was wondering if the fine-tuning script (bash) is the same as: https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune_task_lora.sh and https://github.com/haotian-liu/LLaVA/blob/main/scripts/v1_5/finetune_task.sh ?

AlirezaShamsoshoara avatar May 22 '24 23:05 AlirezaShamsoshoara

Have you figured it out? I attempted to apply the llavaplus application to my task, but the memory usage was too high.

ZhangJinian avatar Jul 09 '25 06:07 ZhangJinian

@ZhangJinian I basically followed most of the parameters shared on LLava-plus huggingface config.json. And I tried to change the fine-tuning parameters based on the config.json file. I got much better results. Hopefully this helps you

AlirezaShamsoshoara avatar Jul 10 '25 06:07 AlirezaShamsoshoara

@ZhangJinian I think these two parameters were different, and I changed them: --mm_projector_type and --vision_tower.

AlirezaShamsoshoara avatar Jul 10 '25 06:07 AlirezaShamsoshoara

Thank you for your reply! @AlirezaShamsoshoara

Did you get good results just by changing the projector_type and vision_tower and using the training_llava_plus_v0_7b.sh script with zero2.json? I created a test dataset with only two images and tried fine-tuning using the training_llava_plus_v0_7b.sh script with zero2.json, but the results were quite poor.

This is my first time trying to fine-tune a large model, and I'm wondering if the poor results are due to my dataset being too small. Do you think I should use LoRA to fine-tune on such a small dataset?

Looking forward to your reply — this would be a big help for me

ZhangJinian avatar Jul 10 '25 10:07 ZhangJinian

I can't use zero3.json to fine-tune the model with LoRA, so i finetune the model with zero2.json. But when I use the model fine-tuned with zero2.json, it gives the following message:

`Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:06<00:00, 3.10s/it] 2025-07-10 19:58:07 | ERROR | stderr | Some weights of the model checkpoint at checkpoints/merged_llavaplusv0_7b_test were not used when initializing LlavaLlamaForCausalLM: ['model.vision_tower.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias',·. ………… 'model.vision_tower.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight']

  • This IS expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing LlavaLlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).`

Is this expected behavior?

ZhangJinian avatar Jul 10 '25 12:07 ZhangJinian