llama 3.2 checkpoint conversion fails
git clone https://github.com/NVIDIA/TensorRT-LLM/
cd TensorRT-LLM/examples/llama
pip install -r requirements.txt
git clone https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
python convert_checkpoint.py --model_dir Llama-3.1-8B-Instruct \
> --output_dir Llama-3.1-8B-Instruct_2gpu_tp2 \
> --dtype float16 \
> --tp_size 2
[TensorRT-LLM] TensorRT-LLM version: 0.14.0.dev2024100800
0.14.0.dev2024100800
[10/09/2024-02:45:28] [TRT-LLM] [W] AutoConfig cannot load the huggingface config.
Traceback (most recent call last):
File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 518, in <module>
main()
File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 510, in main
convert_and_save_hf(args)
File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 452, in convert_and_save_hf
execute(args.workers, [convert_and_save_rank] * world_size, args)
File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 459, in execute
f(args, rank)
File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 438, in convert_and_save_rank
llama = LLaMAForCausalLM.from_hugging_face(
File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 322, in from_hugging_face
config = LLaMAConfig.from_hugging_face(hf_config_or_dir,
File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/models/llama/config.py", line 101, in from_hugging_face
hf_config = transformers.AutoConfig.from_pretrained(
File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 989, in from_pretrained
return config_class.from_dict(config_dict, **unused_kwargs)
File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict
config = cls(**config_dict)
File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 161, in __init__
self._rope_scaling_validation()
File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
raise ValueError(
ValueError: `rope_scaling` must be a dictionary with two fields, `type` and `factor`,
This doc says https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md#llama-v3-updates v3 is supported, but clearly it's not. Probably it meant 3.0 and not 3.x? Not sure.
If you copied the HF llama modeling code you need to update it to the latest version for this to work.
I am also facing same issue
pip install transformers==4.43.2
Close since no recent update, please feel free to reopen this issue if needed.
The support for Llama 3.2 is not ready yet, please wait for the following releases.
As the OP communicates your documentation says that v3 is supported - so you probably need to change it to be more specific - i.e. only v3.0 is supported and not v3.x
And I also don't understand why did you close this?
Close since no recent update, please feel free to reopen this issue if needed.
Update from whom? You are swiping the issue under the carpet.
The user can't reopen the issue so your suggestion can't work.
Hi @stas00, thank you for raising this issue!
TensorRT-LLM doesn't support Llama 3.2 (yet -- coming soon!), though I suspect from the code snippet shared, the question is about Llama 3.1 which is supported
To run Llama 3.1, you can manually upgrade the transformers version after installing TensorRT-LLM: pip install transformers==4.43.2 (thanks for sharing the workaround @Superjomn)
Please let me know if you have further issues. We are working on upgrading the default transformers dependency to remove this manual step.
I will update the documentation to specify which Llama 3.x versions are supported, and I'll figure out why you don't have permissions to re-open an issue
Thank you for the follow up, @laikhtewari
I see the confusion now - I think I tested with both 3.1 and 3.2 and both were failing but the issue I created said 3.2 and repro I listed 3.1 - my bad!
And as @jinxiangshi suggested (not @Superjomn) - the 3.1 issue is fixable by manual transformers update and you said that v3.2 isn't supported yet and you will update the documentation - now it feels you care. I appreciate that.
I will update the OP.
Oops copied the wrong username, thanks @jinxiangshi !
Sorry by closing the issue, we will amend the dependency requirements and update the document for Llama 3 and Llama 3.1. @stas00
Thank you, @Superjomn
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
@stas00 , Although I recall your comment about having moved on to another LLM engine, I still wanted to leave a quick update with the latest resources. Llama v3.2 updates are available here
Closing the issue as stale.
Since the 1.0 release, TensorRT-LLM has adopted the PyTorch workflow as the recommended approach, which no longer relies on trtllm-build and converter. If you get a chance to try the latest release and the issue persists, please feel free to open a new one.