Invalid MIT-MAGIC-COOKIE-1 key
System Info
- OS: Ubuntu 20.04
- GPU: RTX 2080TI
Who can help?
@byshiue @ncomly-nvidia
Information
- [x] The official example scripts
- [ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
I run the following build script in the terminal of ubuntu 20.04 (connected via ssh and the ubuntu has a virtual screen by Xorg).
python xxx/TensorRT-LLM/examples/enc_dec/convert_checkpoint.py --model_type bart \
--model_dir xxx/hub/models--facebook--nougat-small \
--output_dir nougat-small-trt/bfloat16 \
--tp_size 1 \
--pp_size 1 \
--dtype bfloat16 \
--nougat
And the log:
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
Invalid MIT-MAGIC-COOKIE-1 key
I do a lot of search on the web. It looks like this problem is caused by the mpi. But why converting checkpoint needs a screen.
Expected behavior
Run well
actual behavior
See the above
additional notes
No
I am getting the same error trying to build mistral for ChatRTX on linux using python build.py --model_dir './model/mistral/mistral7b_hf' --quant_ckpt_path './model/mistral/mistral7b_int4_quant_weights/mistral_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/mistral/mistral7b_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 7168 --max_batch_size 1 --max_output_len 1024
According to this.
I cannot reproduce this issue locally. Can you have a try on the latest main branch? And follow the install doc to correctly install TensorRT-LLM.
Did you use the local PC or the remote server without screen? Is there any command to check whether the TRT-LLM is correctly installed.
Did you use the local PC or the remote server without screen? Is there any command to check whether the TRT-LLM is correctly installed.
Remote server.
To check installation
python3 -c "import tensorrt_llm"
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
This issue was closed because it has been stalled for 15 days with no activity.
Did you use the local PC or the remote server without screen? Is there any command to check whether the TRT-LLM is correctly installed.
Remote server.
To check installation
python3 -c "import tensorrt_llm"
have you solved it? i met the same problem agian.
System Info
- OS: Ubuntu 20.04
- GPU: RTX 2080TI
Who can help?
Information
- [x] The official example scripts[ ] My own modified scripts
Tasks
- [x] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...)[ ] My own task or dataset (give details below)Reproduction
I run the following build script in the terminal of ubuntu 20.04 (connected via ssh and the ubuntu has a virtual screen by Xorg).
python xxx/TensorRT-LLM/examples/enc_dec/convert_checkpoint.py --model_type bart
--model_dir xxx/hub/models--facebook--nougat-small
--output_dir nougat-small-trt/bfloat16
--tp_size 1
--pp_size 1
--dtype bfloat16
--nougat And the log:[TensorRT-LLM][INFO] Initializing MPI with thread mode 3 Invalid MIT-MAGIC-COOKIE-1 keyI do a lot of search on the web. It looks like this problem is caused by the mpi. But why converting checkpoint needs a screen.
Expected behavior
Run well
actual behavior
See the above
additional notes
No
have you solved it? I got the same error.