TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Parameter --load_model_on_cpu is ignored by llama convert_checkpoint.py (version 0.9.0)

Open fedem96 opened this issue 1 year ago • 1 comments

System Info

  • OS: Ubuntu 22.04.4 LTS
  • Nvidia driver version: 545.23.08
  • CPU architecture: x86
  • RAM size: ~500GB
  • GPUs: 2xL40s 48GB
  • Docker container image: manually compiled tensorrtllm_backend v0.9.0
  • TensorRT-LLM version: 0.9.0

Who can help?

No response

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

cd tensorrt_llm/examples/llama
python convert_checkpoint.py --model_dir mistralai/Mixtral-8x7B-Instruct-v0.1 --output_dir /output/mixtral-w4a16-tp2/ --dtype bfloat16 --tp_size 2 --use_weight_only --weight_only_precision int4 --moe_num_experts 8 --moe_top_k 2 --load_model_on_cpu

Expected behavior

The script should not use the GPUs

actual behavior

The script is using the GPUs, also leading to OOM

additional notes

On the same system, version 0.8.0 didn't have this problem.

fedem96 avatar Apr 26 '24 08:04 fedem96

Same issue here.

ChristianPala avatar May 17 '24 08:05 ChristianPala

Sorry for the late reply, could you please try later versions? @fedem96 @ChristianPala This bug is fixed in later versions. Thanks!

Barry-Delaney avatar Aug 22 '24 18:08 Barry-Delaney