Error when I try to evaluate pretrained Qwen 2.5 0.5B model
Hi,
I pretrained Qwen 2.5 0.5B base model with single layer (on purpose), when I chat with model it "works."
However when I try to evaluate model it fails:
litgpt evaluate \
--tasks 'leaderboard' \
--out_dir 'evaluate/pretrain-core/leaderboard/' \
--batch_size 4 \
--dtype 'bfloat16' \
'out/pretrain-core/final'
Error:
{'access_token': None,
'batch_size': 4,
'checkpoint_dir': PosixPath('checkpoints/../out/pretrain-core/final'),
'device': None,
'dtype': 'bfloat16',
'force_conversion': False,
'limit': None,
'num_fewshot': None,
'out_dir': PosixPath('../evaluate/pretrain-core/leaderboard'),
'save_filepath': None,
'seed': 1234,
'tasks': 'leaderboard'}
{'checkpoint_dir': PosixPath('checkpoints/../out/pretrain-core/final'),
'output_dir': PosixPath('../evaluate/pretrain-core/leaderboard')}
Traceback (most recent call last):
File "/home/tangled/tangled-1.0-0.5b-base/scripts/venv/bin/litgpt", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/tangled/tangled-1.0-0.5b-base/scripts/venv/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
CLI(parser_data)
File "/home/tangled/tangled-1.0-0.5b-base/scripts/venv/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
return _run_component(component, init.get(subcommand))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/tangled/tangled-1.0-0.5b-base/scripts/venv/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
return component(**cfg)
^^^^^^^^^^^^^^^^
File "/home/tangled/tangled-1.0-0.5b-base/scripts/venv/lib/python3.12/site-packages/litgpt/eval/evaluate.py", line 95, in convert_and_evaluate
convert_lit_checkpoint(checkpoint_dir=checkpoint_dir, output_dir=out_dir)
File "/home/tangled/tangled-1.0-0.5b-base/scripts/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/tangled/tangled-1.0-0.5b-base/scripts/venv/lib/python3.12/site-packages/litgpt/scripts/convert_lit_checkpoint.py", line 398, in convert_lit_checkpoint
copy_fn(sd, lit_weights, saver=saver)
File "/home/tangled/tangled-1.0-0.5b-base/scripts/venv/lib/python3.12/site-packages/litgpt/scripts/convert_lit_checkpoint.py", line 160, in copy_weights_llama
to_names = (weight_map[name_template].format(*ids),)
~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: 'transformer.h.{}.attn.qkv.bias'
Hi,
can you post your model_config.yaml please?
The traceback seems to point to Llama conversion but you should have gotten the qwen one if your model name starts with qwen2.5 or qwq.
https://github.com/Lightning-AI/litgpt/blob/f6031e3a88e272ec86ad8f412573699589f4d41b/litgpt/scripts/convert_lit_checkpoint.py#L384
model_config.yaml
attention_logit_softcapping: null
attention_scores_scalar: null
attn_bias: true
bias: false
block_size: 32768
final_logit_softcapping: null
gelu_approximate: none
head_size: 64
hf_config: {}
intermediate_size: 4864
lm_head_bias: false
mlp_class_name: LLaMAMLP
n_embd: 896
n_expert: 0
n_expert_per_token: 0
n_head: 14
n_layer: 1
n_query_groups: 2
name: ''
norm_class_name: RMSNorm
norm_eps: 1.0e-06
norm_qk: false
padded_vocab_size: 151936
padding_multiple: 512
parallel_residual: false
post_attention_norm: false
post_mlp_norm: false
rope_adjustments: null
rope_base: 1000000
rope_condense_ratio: 1
rotary_percentage: 1.0
scale_embeddings: false
shared_attention_norm: false
sliding_window_layer_placing: null
sliding_window_size: null
vocab_size: 151643
pretrain-core-model.yaml
model_name: "Qwen2.5-0.5B"
model_config:
block_size: 32768
vocab_size: 151643
padded_vocab_size: 151936
n_layer: 1
n_head: 14
n_embd: 896
n_query_groups: 2
rotary_percentage: 1.0
parallel_residual: False
bias: False
attn_bias: True
norm_class_name: "RMSNorm"
mlp_class_name: "LLaMAMLP"
intermediate_size: 4864
norm_eps: 1e-6
rope_base: 1000000
# head_size: 64 # n_embd / n_head
I added in field "name" prefix for model to be "qwen2.5" and it worked. Thanks @t-vi !