TensorRT-LLM Dynamic scaling not working on RoPe / rotary

          @byshiue can you try to see if dynamic scaling works? linear scaling works fine. if dynamic scaling doesnt work at all, then this is indeed a bug.

Originally posted by @avianion in https://github.com/NVIDIA/TensorRT-LLM/issues/1595#issuecomment-2112786968

Several users have experienced errors in running engine files which were compiled to use "dynamic" rotary_scaling

Is dynamic scaling supported at this time?

May 15 '24 16:05 TheCodeWrangler

Seconding this. I would like to try dynamic scaling but at the moment only linear scaling works.

Dynamic scaling supposedly provides better results. But this isn't possible to try at the moment.

May 15 '24 16:05 avianion

@byshiue @kaiyux

May 15 '24 21:05 avianion

Hi, @avianion and @TheCodeWrangler, could you, please, provide the reproducing steps?

May 17 '24 07:05 nekorobov

@nekorobov

Hi.

Please follow these steps:

Install Llama 3 8b or 70b instruct version, and convert it to checkpoint with following command

python3 convert_checkpoint.py --model_dir llama370b \ --output_dir llama-3-70b-ckpt \ --dtype float16 --workers 2

CD into the checkpoint directory, and modify the config.json file to have following rotary scaling settings

"rotary_scaling": { "type": "dynamic", "factor": 2.0 },

build the checkpoint using this

trtllm-build --checkpoint_dir llama-3-70b-ckpt \ --output_dir ./llama-3-70b-engine \ --gemm_plugin float16 \ --gpt_attention_plugin float16 \ --max_batch_size 8 \ --workers 2 \

once built simply run inference on the checkpoint

you can try a basic command like this

Observe that you will either get no output or an error regarding the encoding. No matter what settings you use.

May 17 '24 09:05 avianion

I was not able to reproduce neither with 8B, nor with 8B-instruct version and dynamic scaling. Could you, please, share what output you get? Note that I was using single GPU setup.

May 17 '24 22:05 nekorobov

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

Jun 23 '24 01:06 github-actions[bot]

Dynamic scaling not working on RoPe / rotary_scaling