[Bug] Can NOT load saved model from training a lora trained model
Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [x] 2. The bug has not been fixed in the latest version.
- [x] 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
Describe the bug
Weights saved from a fine-tuning that starts from a lora checkpoint can not be merged or loaded.
Reproduction
I fine tuned a model from a previous fine-tuned checkpoint by replacing https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/shell/internvl2.0/2nd_finetune/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora.sh#L32
from
--model_name_or_path "./pretrained/InternVL2-8B"
to
--model_name_or_path "/root/projects/InternVL-Epsi/internvl_chat/work_dirs/internvl_chat_v2_0/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora_20240819_075328/checkpoint-6000"
The training code works and I can see updates in Tensorboard. Then if I run the merge_lora below on the newly save model checkpoint-3000 starting from the checkpoint-6000 above
python tools/merge_lora.py work_dirs/internvl_chat_v2_0/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora_20240820_064549/checkpoint-3000/ work_dirs/internvl_chat_v2_0/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora_20240820_064549/checkpoint-3000/merged_lora
I got errors of
RuntimeError: The weights trying to be saved contained shared tensors [{... 'language_model.model.layers.31.ffn_norm.weight', 'language_model.model.layers.16.attention_norm.weight', 'language_model.model.layers.19.attention_norm.weight', 'language_model.model.layers.28.attention_norm.weight', 'language_model.model.layers.7.ffn_norm.weight', 'language_model.model.layers.26.ffn_norm.weight', 'language_model.model.layers.12.attention_norm.weight', 'language_model.model.layers.6.attention_norm.weight'}] that are mismatching the transformers base configuration. Try saving using `safe_serialization=False` or remove this tensor sharing.
Environment
CUDA Version: 12.4
python: 3.10.12
torch: 2.4.0+cu121
transformers: 4.44.0
Error traceback
root@multi:~/projects/InternVL-Epsi/internvl_chat# python tools/merge_lora.py work_dirs/internvl_chat_v2_0/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora_20240820_064549/checkpoint-3000/ work_dirs/internvl_chat_v2_0/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora_20240820_064549/checkpoint-3000/merged_lora
Loading model...
trainable params: 37,748,736 || all params: 7,775,531,008 || trainable%: 0.4855
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 6.94it/s]
Some weights of the model checkpoint at work_dirs/internvl_chat_v2_0/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora_20240820_064549/checkpoint-3000/ were not used when initializing InternVLChatModel: ['language_model.base_model.model.base_model.model.model.layers.0.attention.wo.base_layer.weight', 'language_model.base_model.model.base_model.model.model.layers.0.attention.wo.lora_A.default.weight', 'language_model.base_model.model.base_model.model.model.layers.0.attention.wo.lora_B.default.weight', 'language_model.base_model.model.base_model.model.model.layers.0.attention.wqkv.base_layer.weight', 'language_model.base_model.model.base_model.model.model.layers.0.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.base_model.model.model.layers.0.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.base_model.model.model.layers.0.attention_norm.weight', 'language_model.base_model.model.base_model.model.model.layers.0.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.base_model.model.model.layers.0.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.base_model.model.model.layers.0.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.base_model.model.model.layers.0.feed_forward.w2.base_layer.weight', ....... 'language_model.base_model.model.base_model.model.model.layers.8.ffn_norm.weight', 'language_model.base_model.model.base_model.model.model.layers.9.attention.wo.base_layer.weight', 'language_model.base_model.model.base_model.model.model.layers.9.attention.wo.lora_A.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.attention.wo.lora_B.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.attention.wqkv.base_layer.weight', 'language_model.base_model.model.base_model.model.model.layers.9.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.attention_norm.weight', 'language_model.base_model.model.base_model.model.model.layers.9.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.base_model.model.model.layers.9.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.base_model.model.model.layers.9.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.base_model.model.model.layers.9.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.base_model.model.model.layers.9.ffn_norm.weight', 'language_model.base_model.model.base_model.model.model.norm.weight', 'language_model.base_model.model.base_model.model.model.tok_embeddings.weight', 'language_model.base_model.model.base_model.model.output.weight']
- This IS expected if you are initializing InternVLChatModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing InternVLChatModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of InternVLChatModel were not initialized from the model checkpoint at work_dirs/internvl_chat_v2_0/internvl2_8b_internlm2_7b_dynamic_res_2nd_finetune_lora_20240820_064549/checkpoint-3000/ and are newly initialized: ['language_model.base_model.model.model.layers.0.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.0.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.0.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.0.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.0.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.0.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.0.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.0.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.0.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.0.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.0.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.0.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.0.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.0.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.1.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.1.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.1.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.1.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.1.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.1.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.1.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.1.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.1.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.1.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.1.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.1.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.1.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.1.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.10.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.10.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.10.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.10.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.10.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.10.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.10.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.10.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.10.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.10.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.10.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.10.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.10.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.10.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.11.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.11.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.11.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.11.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.11.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.11.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.11.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.11.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.11.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.11.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.11.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.11.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.11.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.11.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.12.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.12.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.12.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.12.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.12.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.12.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.12.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.12.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.12.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.12.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.12.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.12.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.12.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.12.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.13.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.13.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.13.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.13.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.13.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.13.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.13.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.13.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.13.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.13.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.13.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.13.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.13.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.13.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.14.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.14.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.14.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.14.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.14.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.14.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.14.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.14.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.14.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.14.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.14.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.14.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.14.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.14.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.15.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.15.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.15.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.15.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.15.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.15.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.15.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.15.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.15.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.15.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.15.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.15.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.15.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.15.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.16.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.16.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.16.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.16.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.16.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.16.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.16.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.16.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.16.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.16.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.16.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.16.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.16.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.16.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.17.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.17.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.17.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.17.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.17.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.17.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.17.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.17.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.17.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.17.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.17.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.17.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.17.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.17.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.18.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.18.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.18.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.18.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.18.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.18.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.18.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.18.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.18.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.18.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.18.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.18.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.18.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.18.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.19.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.19.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.19.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.19.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.19.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.19.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.19.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.19.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.19.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.19.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.19.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.19.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.19.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.19.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.2.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.2.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.2.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.2.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.2.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.2.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.2.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.2.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.2.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.2.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.2.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.2.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.2.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.2.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.20.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.20.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.20.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.20.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.20.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.20.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.20.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.20.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.20.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.20.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.20.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.20.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.20.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.20.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.21.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.21.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.21.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.21.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.21.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.21.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.21.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.21.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.21.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.21.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.21.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.21.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.21.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.21.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.22.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.22.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.22.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.22.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.22.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.22.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.22.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.22.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.22.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.22.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.22.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.22.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.22.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.22.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.23.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.23.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.23.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.23.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.23.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.23.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.23.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.23.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.23.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.23.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.23.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.23.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.23.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.23.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.24.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.24.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.24.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.24.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.24.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.24.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.24.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.24.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.24.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.24.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.24.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.24.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.24.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.24.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.25.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.25.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.25.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.25.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.25.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.25.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.25.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.25.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.25.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.25.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.25.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.25.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.25.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.25.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.26.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.26.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.26.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.26.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.26.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.26.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.26.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.26.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.26.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.26.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.26.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.26.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.26.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.26.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.27.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.27.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.27.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.27.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.27.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.27.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.27.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.27.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.27.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.27.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.27.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.27.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.27.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.27.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.28.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.28.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.28.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.28.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.28.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.28.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.28.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.28.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.28.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.28.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.28.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.28.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.28.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.28.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.29.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.29.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.29.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.29.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.29.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.29.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.29.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.29.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.29.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.29.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.29.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.29.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.29.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.29.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.3.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.3.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.3.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.3.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.3.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.3.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.3.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.3.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.3.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.3.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.3.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.3.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.3.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.3.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.30.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.30.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.30.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.30.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.30.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.30.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.30.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.30.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.30.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.30.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.30.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.30.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.30.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.30.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.31.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.31.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.31.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.31.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.31.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.31.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.31.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.31.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.31.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.31.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.31.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.31.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.31.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.31.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.4.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.4.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.4.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.4.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.4.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.4.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.4.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.4.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.4.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.4.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.4.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.4.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.4.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.4.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.5.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.5.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.5.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.5.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.5.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.5.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.5.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.5.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.5.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.5.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.5.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.5.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.5.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.5.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.6.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.6.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.6.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.6.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.6.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.6.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.6.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.6.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.6.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.6.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.6.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.6.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.6.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.6.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.7.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.7.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.7.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.7.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.7.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.7.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.7.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.7.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.7.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.7.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.7.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.7.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.7.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.7.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.8.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.8.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.8.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.8.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.8.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.8.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.8.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.8.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.8.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.8.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.8.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.8.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.8.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.8.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.layers.9.attention.wo.lora_A.default.weight', 'language_model.base_model.model.model.layers.9.attention.wo.lora_B.default.weight', 'language_model.base_model.model.model.layers.9.attention.wqkv.base_layer.weight', 'language_model.base_model.model.model.layers.9.attention.wqkv.lora_A.default.weight', 'language_model.base_model.model.model.layers.9.attention.wqkv.lora_B.default.weight', 'language_model.base_model.model.model.layers.9.feed_forward.w1.base_layer.weight', 'language_model.base_model.model.model.layers.9.feed_forward.w1.lora_A.default.weight', 'language_model.base_model.model.model.layers.9.feed_forward.w1.lora_B.default.weight', 'language_model.base_model.model.model.layers.9.feed_forward.w2.base_layer.weight', 'language_model.base_model.model.model.layers.9.feed_forward.w2.lora_A.default.weight', 'language_model.base_model.model.model.layers.9.feed_forward.w2.lora_B.default.weight', 'language_model.base_model.model.model.layers.9.feed_forward.w3.base_layer.weight', 'language_model.base_model.model.model.layers.9.feed_forward.w3.lora_A.default.weight', 'language_model.base_model.model.model.layers.9.feed_forward.w3.lora_B.default.weight', 'language_model.base_model.model.model.tok_embeddings.weight', 'language_model.base_model.model.output.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading tokenizer...
Saving model...
[2024-08-20 20:03:53,001] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/linear.py:49: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, weight, bias=None):
/usr/local/lib/python3.10/dist-packages/deepspeed/runtime/zero/linear.py:67: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
Traceback (most recent call last):
File "/root/projects/InternVL-Epsi/internvl_chat/tools/merge_lora.py", line 28, in <module>
model.save_pretrained(args.output_path)
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 2701, in save_pretrained
raise RuntimeError(
RuntimeError: The weights trying to be saved contained shared tensors [{'language_model.model.layers.25.attention.wo.weight', 'language_model.model.layers.24.attention.wo.weight', 'language_model.model.layers.29.attention.wo.weight', 'language_model.model.layers.16.attention.wo.weight', 'language_model.model.layers.23.attention.wo.weight', 'language_model.model.layers.13.attention.wo.weight', 'language_model.model.layers.4.attention.wo.weight', 'language_model.model.layers.20.attention.wo.weight', 'language_model.model.layers.6.attention.wo.weight', 'language_model.model.layers.31.attention.wo.weight', 'language_model.model.layers.8.attention.wo.weight', 'language_model.model.layers.7.attention.wo.weight', 'language_model.model.layers.10.attention.wo.weight', 'language_model.model.layers.1.attention.wo.weight', 'language_model.model.layers.17.attention.wo.weight', 'language_model.model.layers.28.attention.wo.weight', 'language_model.model.layers.3.attention.wo.weight', 'language_model.model.layers.9.attention.wo.weight', 'language_model.model.layers.14.attention.wo.weight', 'language_model.model.layers.0.attention.wo.weight', 'language_model.model.layers.18.attention.wo.weight', 'language_model.model.layers.19.attention.wo.weight', 'language_model.model.layers.21.attention.wo.weight', 'language_model.model.layers.15.attention.wo.weight', 'language_model.model.layers.30.attention.wo.weight', 'language_model.model.layers.11.attention.wo.weight', 'language_model.model.layers.2.attention.wo.weight', 'language_model.model.layers.5.attention.wo.weight', 'language_model.model.layers.27.attention.wo.weight', 'language_model.model.layers.22.attention.wo.weight', 'language_model.model.layers.26.attention.wo.weight', 'language_model.model.layers.12.attention.wo.weight'}, {'language_model.model.layers.20.attention_norm.weight', 'language_model.model.layers.9.ffn_norm.weight', 'language_model.model.layers.23.ffn_norm.weight', 'language_model.model.layers.30.ffn_norm.weight', 'language_model.model.layers.18.attention_norm.weight', 'language_model.model.layers.22.attention_norm.weight', 'language_model.model.layers.5.ffn_norm.weight', 'language_model.model.layers.0.attention_norm.weight', 'language_model.model.layers.17.ffn_norm.weight', 'language_model.model.layers.4.attention_norm.weight', 'language_model.model.layers.22.ffn_norm.weight', 'language_model.model.layers.23.attention_norm.weight', 'language_model.model.layers.21.attention_norm.weight', 'language_model.model.layers.1.attention_norm.weight', 'language_model.model.layers.2.attention_norm.weight', 'language_model.model.layers.13.ffn_norm.weight', 'language_model.model.layers.19.ffn_norm.weight', 'language_model.model.layers.15.attention_norm.weight', 'language_model.model.layers.26.attention_norm.weight', 'language_model.model.layers.24.attention_norm.weight', 'language_model.model.layers.4.ffn_norm.weight', 'language_model.model.norm.weight', 'language_model.model.layers.31.attention_norm.weight', 'language_model.model.layers.17.attention_norm.weight', 'language_model.model.layers.24.ffn_norm.weight', 'language_model.model.layers.14.ffn_norm.weight', 'language_model.model.layers.14.attention_norm.weight', 'language_model.model.layers.10.ffn_norm.weight', 'language_model.model.layers.9.attention_norm.weight', 'language_model.model.layers.13.attention_norm.weight', 'language_model.model.layers.20.ffn_norm.weight', 'language_model.model.layers.28.ffn_norm.weight', 'language_model.model.layers.5.attention_norm.weight', 'language_model.model.layers.11.attention_norm.weight', 'language_model.model.layers.10.attention_norm.weight', 'language_model.model.layers.29.attention_norm.weight', 'language_model.model.layers.27.attention_norm.weight', 'language_model.model.layers.7.attention_norm.weight', 'language_model.model.layers.30.attention_norm.weight', 'language_model.model.layers.12.ffn_norm.weight', 'language_model.model.layers.6.ffn_norm.weight', 'language_model.model.layers.25.ffn_norm.weight', 'language_model.model.layers.29.ffn_norm.weight', 'language_model.model.layers.3.attention_norm.weight', 'language_model.model.layers.3.ffn_norm.weight', 'language_model.model.layers.27.ffn_norm.weight', 'language_model.model.layers.18.ffn_norm.weight', 'language_model.model.layers.16.ffn_norm.weight', 'language_model.model.layers.21.ffn_norm.weight', 'language_model.model.layers.8.attention_norm.weight', 'language_model.model.layers.8.ffn_norm.weight', 'language_model.model.layers.2.ffn_norm.weight', 'language_model.model.layers.0.ffn_norm.weight', 'language_model.model.layers.11.ffn_norm.weight', 'language_model.model.layers.25.attention_norm.weight', 'language_model.model.layers.15.ffn_norm.weight', 'language_model.model.layers.1.ffn_norm.weight', 'language_model.model.layers.31.ffn_norm.weight', 'language_model.model.layers.16.attention_norm.weight', 'language_model.model.layers.19.attention_norm.weight', 'language_model.model.layers.28.attention_norm.weight', 'language_model.model.layers.7.ffn_norm.weight', 'language_model.model.layers.26.ffn_norm.weight', 'language_model.model.layers.12.attention_norm.weight', 'language_model.model.layers.6.attention_norm.weight'}] that are mismatching the transformers base configuration. Try saving using `safe_serialization=False` or remove this tensor sharing.