llama convert add rotary_scaling param in cli_args
In the convert_checkpoint.py of llama, if we define the args with command, the rotary_scaling param can not be missed in some situations, so rotary_scaling param should be added to avoid the special situations.
For example, deepseek-coder-6.7b-base, it needs the rotary_scaling param.
"rope_scaling": {
"factor": 4.0,
"type": "linear"
}
Otherwise, there will be a lot of duplicate tokens during inference.
curl -X POST localhost:8620/v2/models/ensemble/generate -d '{"text_input": "def quick_sort", "max_tokens": 10, "bad_words": "", "stop_words": "", "stream": true, "temperature": 0.2, "return_log_probs": true, "top_p": 0.75, "end_id": [32022]}'
"text_output":"sortsortsortsortsortsortsortsortsortsort"
Hi @activezhao , thanks for contributing to TRT-LLM project.
TRT-LLM refactored the checkpoint generation logic during the past months. Now, the latest code logic will read the rope_scaling field from hf config.json automatically (https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/llama/convert.py#L1196) and we don't allow the user to set this field manually.
Would u please have a retry on deepseek-coder-6.7b-base with latest TRT-LLM?
Hi @activezhao , thanks for contributing to TRT-LLM project.
TRT-LLM refactored the checkpoint generation logic during the past months. Now, the latest code logic will read the rope_scaling field from hf config.json automatically (https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/models/llama/convert.py#L1196) and we don't allow the user to set this field manually.
Would u please have a retry on deepseek-coder-6.7b-base with latest TRT-LLM?
@nv-guomingz OK, I will try it. Thanks.
PR has not received an update in over 14 days. Adding stale label.