TensorRT-LLM llama 3.2 checkpoint conversion fails

git clone https://github.com/NVIDIA/TensorRT-LLM/
cd TensorRT-LLM/examples/llama
pip install -r requirements.txt
git clone https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
python convert_checkpoint.py --model_dir Llama-3.1-8B-Instruct \
>                             --output_dir Llama-3.1-8B-Instruct_2gpu_tp2 \
>                             --dtype float16 \
>                             --tp_size 2
[TensorRT-LLM] TensorRT-LLM version: 0.14.0.dev2024100800
0.14.0.dev2024100800
[10/09/2024-02:45:28] [TRT-LLM] [W] AutoConfig cannot load the huggingface config.
Traceback (most recent call last):
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 518, in <module>
    main()
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 510, in main
    convert_and_save_hf(args)
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 452, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 459, in execute
    f(args, rank)
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 438, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 322, in from_hugging_face
    config = LLaMAConfig.from_hugging_face(hf_config_or_dir,
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/models/llama/config.py", line 101, in from_hugging_face
    hf_config = transformers.AutoConfig.from_pretrained(
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 989, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict
    config = cls(**config_dict)
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 161, in __init__
    self._rope_scaling_validation()
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
    raise ValueError(
ValueError: `rope_scaling` must be a dictionary with two fields, `type` and `factor`,

This doc says https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md#llama-v3-updates v3 is supported, but clearly it's not. Probably it meant 3.0 and not 3.x? Not sure.

If you copied the HF llama modeling code you need to update it to the latest version for this to work.

Oct 09 '24 02:10 stas00

I am also facing same issue

Oct 09 '24 16:10 aabbhishekksr

pip install transformers==4.43.2

Oct 10 '24 05:10 jinxiangshi

Close since no recent update, please feel free to reopen this issue if needed.

Oct 16 '24 06:10 Superjomn

The support for Llama 3.2 is not ready yet, please wait for the following releases.

Oct 16 '24 08:10 Superjomn

As the OP communicates your documentation says that v3 is supported - so you probably need to change it to be more specific - i.e. only v3.0 is supported and not v3.x

Oct 16 '24 16:10 stas00

And I also don't understand why did you close this?

Close since no recent update, please feel free to reopen this issue if needed.

Update from whom? You are swiping the issue under the carpet.

The user can't reopen the issue so your suggestion can't work.

Oct 16 '24 17:10 stas00

Hi @stas00, thank you for raising this issue!

TensorRT-LLM doesn't support Llama 3.2 (yet -- coming soon!), though I suspect from the code snippet shared, the question is about Llama 3.1 which is supported

To run Llama 3.1, you can manually upgrade the transformers version after installing TensorRT-LLM: pip install transformers==4.43.2 (thanks for sharing the workaround @Superjomn)

Please let me know if you have further issues. We are working on upgrading the default transformers dependency to remove this manual step.

I will update the documentation to specify which Llama 3.x versions are supported, and I'll figure out why you don't have permissions to re-open an issue

Oct 16 '24 17:10 laikhtewari

Thank you for the follow up, @laikhtewari

I see the confusion now - I think I tested with both 3.1 and 3.2 and both were failing but the issue I created said 3.2 and repro I listed 3.1 - my bad!

And as @jinxiangshi suggested (not @Superjomn) - the 3.1 issue is fixable by manual transformers update and you said that v3.2 isn't supported yet and you will update the documentation - now it feels you care. I appreciate that.

I will update the OP.

Oct 16 '24 17:10 stas00

Oops copied the wrong username, thanks @jinxiangshi !

Oct 16 '24 18:10 laikhtewari

Sorry by closing the issue, we will amend the dependency requirements and update the document for Llama 3 and Llama 3.1. @stas00

Oct 17 '24 02:10 Superjomn

Thank you, @Superjomn

Oct 17 '24 02:10 stas00

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

Nov 17 '24 02:11 github-actions[bot]

@stas00 , Although I recall your comment about having moved on to another LLM engine, I still wanted to leave a quick update with the latest resources. Llama v3.2 updates are available here

Sep 06 '25 18:09 karljang

Closing the issue as stale.
Since the 1.0 release, TensorRT-LLM has adopted the PyTorch workflow as the recommended approach, which no longer relies on trtllm-build and converter. If you get a chance to try the latest release and the issue persists, please feel free to open a new one.

Nov 14 '25 20:11 karljang