Wrong answer in the merged model weights
Hi! Thanks for your great work!
I have two questions.
(1) When I use the following setting
models:
- model: /data2/model/Quantize/llama2-chat_normal
parameters:
weight: 0.1
- model: /data2/model/Quantize/llama2-chat_normal
parameters:
weight: 1.0
merge_method: linear
dtype: float32
and I print the detailed weights as
print("model1:")
print(model_1.state_dict()['model.embed_tokens.weight'][0,0:3])
print("model2:")
print(model_2.state_dict()['model.embed_tokens.weight'][0,0:3])
print("target:")
print(model_1.state_dict()['model.embed_tokens.weight'][0,0:3]*0.1 + 1.0*model_2.state_dict()['model.embed_tokens.weight'][0,0:3])
print("result:")
print(merged_model.state_dict()['model.embed_tokens.weight'][0,0:3])
The merged model weights are different from my target, I am confused (I also try normalize:False).
model1: tensor([ 1.1921e-06, -1.7881e-06, -4.2915e-06])
model2: tensor([ 1.1921e-06, -1.7881e-06, -4.2915e-06])
target: tensor([ 1.3113e-06, -1.9670e-06, -4.7207e-06])
result: tensor([ 1.1921e-06, -1.7881e-06, -4.2915e-06])
(2)My next question is how to merge llama-2-7b-chat and wizardmath-7b-v1.0. Although they are all fine-tuned from llama-2-7b, but the architectures are different.
- wizardmath-7b-v1.0:
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32001, 4096, padding_idx=0)
- llama-2-7b-chat
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(32000, 4096, padding_idx=0)
Therefore, when I infer the merged model, it returns an error:
ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([32001, 4096])), this look incorrect.
However, I found that sometimes I can successfully merge two models, it is like a random event, does it mean the merging process is unstable?
Read about tokenizer merge here
Or short try
tokenizer_source: union
Hi! Thanks for your help.
I print the detailed weights, and it is a wrong answer when using linear, but, when I use task_arithmetic, it does not return errors like ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([32001, 4096])), this looks incorrect. And, the detailed weights are also right, so I guess there are some errors in the implementation in the linear method.
I cannot reproduce the first issue. Please double-check that normalize is set to False, and it should produce the exact same tensor as you expected.
Regarding the second issue, when tokenizer_source is empty, it results in the legacy behavior:
- The merged model will always use the first (base) model’s
vocab_size, which is 32001 ifwizardmath-7b-v1.0is your first model. https://github.com/arcee-ai/mergekit/blob/4ecb205d191a9d76c50ab166ae05712619709277/mergekit/merge.py#L154-L157 - The embedding layers will be truncated to the smallest size present in the merge (i.e., 32000).
The inconsistency between vocab_size and the shape of the embedding layers would prevent you from loading the merged model.
To solve this, you can either:
- Change
vocab_sizein theconfig.jsonof your merged model to 32000, or - Specify
tokenizer_source: model:meta/llama-2-7b-chatand merge your models again. (usingunionwill not work because the length of the unioned tokenizer is 32001, while linear merging will always truncate the embedding layers to 32000) https://github.com/arcee-ai/mergekit/blob/4ecb205d191a9d76c50ab166ae05712619709277/mergekit/merge.py#L94-L95
@eggry I am facing same issues and already tried what you suggested by changing the vocab_size in the config.json but that is giving another error
self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/sparse.py", line 134, in __init__
assert padding_idx < self.num_embeddings, 'Padding_idx must be within num_embeddings'
AssertionError: Padding_idx must be within num_embeddings
Hello, @monk1337,
After examining the tips of LLaMA2 model, I've noticed that your merged tokenizer might explicitly specify a pad_token. To confirm this, please review if your tokenizer_config.json sets the pad_token to a non-null value, AND this token is presented in your added_tokens.json.
If that is the case, resetting it to null could resolve your problem. In case this pad token is necessary for your workflow, you can manually add them back after the tokenizer is loaded, as suggested in the LLaMA2 model's documentation.
@eggry Yes, I just checked and
tokenizer_config.json sets the "pad_token": "<|end_of_turn|>",
and the added_tokens.json looks like this
{
"<|end_of_turn|>": 32000,
"<|pad_0|>": 32001
}
Now, tokenizer_config.json sets the "pad_token": null
and in config.json set the "vocab_size": 32000
and in deleted the added_tokens.json content
The error is same
Hello, @monk1337,
Maybe your model's config.json have also specified an pad_token_id. If that is the case, simply comment out this entry or changing the value to -1 may resolve the error.
@eggry, it worked. Which solution is better?
- Changing vocab and padding in an already merged model.
- Defining a higher on vocab tokenizer in
tokenizer_source:during merging?
@monk1337, From my personal opinion, I prefer
- Carefully specify the model orders in
base_modelandmodelto ensure that the model with the smallest embedding size is the first one (so that the configuration of the merged model will be based on this model), and - Specify
tokenizer_sourceto this one.
However, I suspect there is no such a universal solution for merging models whose tokenizer/embedding layer differs in the existing implementation: The model configuration & embedding layer & tokenizer & lm_head are all matters.
@eggry This recipe makes sense. I'm going to try it. Thank you! It was quite helpful.!
@eggry Sorry to bug again, here I am trying to merge llama-3 and starling model as trying something what you suggested but getting error.
slices:
- sources:
- model: Nexusflow/Starling-LM-7B-beta
layer_range: [0, 32]
- model: meta-llama/Meta-Llama-3-8B
layer_range: [0, 32]
merge_method: slerp
base_model: Nexusflow/Starling-LM-7B-beta
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: float16
tokenizer_source: model:Nexusflow/Starling-LM-7B-beta
this is the error
File "/workspace/axolotl/out/mergekit/mergekit/merge_methods/tokenizer_permute.py", line 88, in execute
torch.tensor(weights, dtype=expanded.dtype, device=expanded.device)
TypeError: must be real number, not NoneType
I'm facing the same problem for Linear Merge.. I solded it by Tokenizer Merge suggested by @NeonBohdan It works now.