TensorRT-LLM Invalid MIT-MAGIC-COOKIE-1 key

System Info

OS: Ubuntu 20.04
GPU: RTX 2080TI

Who can help?

@byshiue @ncomly-nvidia

Information

[x] The official example scripts
[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

I run the following build script in the terminal of ubuntu 20.04 (connected via ssh and the ubuntu has a virtual screen by Xorg).

python xxx/TensorRT-LLM/examples/enc_dec/convert_checkpoint.py --model_type bart \
    --model_dir xxx/hub/models--facebook--nougat-small \
    --output_dir nougat-small-trt/bfloat16 \
    --tp_size 1 \
    --pp_size 1 \
    --dtype bfloat16 \
    --nougat

And the log:

[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
Invalid MIT-MAGIC-COOKIE-1 key

I do a lot of search on the web. It looks like this problem is caused by the mpi. But why converting checkpoint needs a screen.

Expected behavior

Run well

actual behavior

See the above

additional notes

No

Sep 23 '24 08:09 sherlcok314159

I am getting the same error trying to build mistral for ChatRTX on linux using python build.py --model_dir './model/mistral/mistral7b_hf' --quant_ckpt_path './model/mistral/mistral7b_int4_quant_weights/mistral_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/mistral/mistral7b_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 7168 --max_batch_size 1 --max_output_len 1024 According to this.

Sep 24 '24 11:09 hweiske

I cannot reproduce this issue locally. Can you have a try on the latest main branch? And follow the install doc to correctly install TensorRT-LLM.

Sep 27 '24 10:09 lfr-0531

Did you use the local PC or the remote server without screen? Is there any command to check whether the TRT-LLM is correctly installed.

Sep 27 '24 11:09 sherlcok314159

Did you use the local PC or the remote server without screen? Is there any command to check whether the TRT-LLM is correctly installed.

Remote server.

To check installation

python3 -c "import tensorrt_llm"

Sep 27 '24 11:09 lfr-0531

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

Oct 28 '24 02:10 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

Nov 13 '24 02:11 github-actions[bot]

Did you use the local PC or the remote server without screen? Is there any command to check whether the TRT-LLM is correctly installed.

Remote server.

To check installation

python3 -c "import tensorrt_llm"

have you solved it? i met the same problem agian.

Mar 05 '25 01:03 corkiyao

System Info

OS: Ubuntu 20.04

GPU: RTX 2080TI

Who can help?

@byshiue @ncomly-nvidia

Information

[x] The official example scripts[ ] My own modified scripts

Tasks

[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)[ ] My own task or dataset (give details below)

Reproduction

I run the following build script in the terminal of ubuntu 20.04 (connected via ssh and the ubuntu has a virtual screen by Xorg).

python xxx/TensorRT-LLM/examples/enc_dec/convert_checkpoint.py --model_type bart
--model_dir xxx/hub/models--facebook--nougat-small
--output_dir nougat-small-trt/bfloat16
--tp_size 1
--pp_size 1
--dtype bfloat16
--nougat And the log:
[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
Invalid MIT-MAGIC-COOKIE-1 key
I do a lot of search on the web. It looks like this problem is caused by the mpi. But why converting checkpoint needs a screen.

Expected behavior

Run well

actual behavior

See the above

additional notes

No

have you solved it? I got the same error.

Mar 05 '25 01:03 corkiyao