TensorRT-LLM llama convert add rotary_scaling param in cli

In the convert_checkpoint.py of llama, if we define the args with command, the rotary_scaling param can not be missed in some situations, so rotary_scaling param should be added to avoid the special situations.

For example, deepseek-coder-6.7b-base, it needs the rotary_scaling param.

  "rope_scaling": {
    "factor": 4.0,
    "type": "linear"
  }

Otherwise, there will be a lot of duplicate tokens during inference.

curl -X POST localhost:8620/v2/models/ensemble/generate -d '{"text_input": "def quick_sort", "max_tokens": 10, "bad_words": "", "stop_words": "", "stream": true, "temperature": 0.2, "return_log_probs": true, "top_p": 0.75, "end_id": [32022]}'

"text_output":"sortsortsortsortsortsortsortsortsortsort"

Apr 01 '24 03:04 activezhao

Hi @activezhao , thanks for contributing to TRT-LLM project.

TRT-LLM refactored the checkpoint generation logic during the past months. Now, the latest code logic will read the rope_scaling field from hf config.json automatically (https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/llama/convert.py#L1196) and we don't allow the user to set this field manually.

Would u please have a retry on deepseek-coder-6.7b-base with latest TRT-LLM?

May 15 '24 11:05 nv-guomingz

Hi @activezhao , thanks for contributing to TRT-LLM project.

TRT-LLM refactored the checkpoint generation logic during the past months. Now, the latest code logic will read the rope_scaling field from hf config.json automatically (https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/llama/convert.py#L1196) and we don't allow the user to set this field manually.

Would u please have a retry on deepseek-coder-6.7b-base with latest TRT-LLM?

@nv-guomingz OK, I will try it. Thanks.

May 20 '24 15:05 activezhao

PR has not received an update in over 14 days. Adding stale label.

Jan 28 '25 09:01 github-actions[bot]

llama convert add rotary_scaling param in cli_args