TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Invalid MIT-MAGIC-COOKIE-1 key

Open sherlcok314159 opened this issue 1 year ago • 4 comments

System Info

  • OS: Ubuntu 20.04
  • GPU: RTX 2080TI

Who can help?

@byshiue @ncomly-nvidia

Information

  • [x] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

I run the following build script in the terminal of ubuntu 20.04 (connected via ssh and the ubuntu has a virtual screen by Xorg).

python xxx/TensorRT-LLM/examples/enc_dec/convert_checkpoint.py --model_type bart \
    --model_dir xxx/hub/models--facebook--nougat-small \
    --output_dir nougat-small-trt/bfloat16 \
    --tp_size 1 \
    --pp_size 1 \
    --dtype bfloat16 \
    --nougat

And the log:

[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
Invalid MIT-MAGIC-COOKIE-1 key

I do a lot of search on the web. It looks like this problem is caused by the mpi. But why converting checkpoint needs a screen.

Expected behavior

Run well

actual behavior

See the above

additional notes

No

sherlcok314159 avatar Sep 23 '24 08:09 sherlcok314159

I am getting the same error trying to build mistral for ChatRTX on linux using python build.py --model_dir './model/mistral/mistral7b_hf' --quant_ckpt_path './model/mistral/mistral7b_int4_quant_weights/mistral_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/mistral/mistral7b_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 7168 --max_batch_size 1 --max_output_len 1024 According to this.

hweiske avatar Sep 24 '24 11:09 hweiske

I cannot reproduce this issue locally. Can you have a try on the latest main branch? And follow the install doc to correctly install TensorRT-LLM.

lfr-0531 avatar Sep 27 '24 10:09 lfr-0531

Did you use the local PC or the remote server without screen? Is there any command to check whether the TRT-LLM is correctly installed.

sherlcok314159 avatar Sep 27 '24 11:09 sherlcok314159

Did you use the local PC or the remote server without screen? Is there any command to check whether the TRT-LLM is correctly installed.

Remote server.

To check installation

python3 -c "import tensorrt_llm"

lfr-0531 avatar Sep 27 '24 11:09 lfr-0531

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] avatar Oct 28 '24 02:10 github-actions[bot]

This issue was closed because it has been stalled for 15 days with no activity.

github-actions[bot] avatar Nov 13 '24 02:11 github-actions[bot]

Did you use the local PC or the remote server without screen? Is there any command to check whether the TRT-LLM is correctly installed.

Remote server.

To check installation

python3 -c "import tensorrt_llm"

have you solved it? i met the same problem agian.

corkiyao avatar Mar 05 '25 01:03 corkiyao

System Info

  • OS: Ubuntu 20.04
  • GPU: RTX 2080TI

Who can help?

@byshiue @ncomly-nvidia

Information

  • [x] The official example scripts[ ] My own modified scripts

Tasks

  • [x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)[ ] My own task or dataset (give details below)

Reproduction

I run the following build script in the terminal of ubuntu 20.04 (connected via ssh and the ubuntu has a virtual screen by Xorg).

python xxx/TensorRT-LLM/examples/enc_dec/convert_checkpoint.py --model_type bart
--model_dir xxx/hub/models--facebook--nougat-small
--output_dir nougat-small-trt/bfloat16
--tp_size 1
--pp_size 1
--dtype bfloat16
--nougat And the log:

[TensorRT-LLM][INFO] Initializing MPI with thread mode 3
Invalid MIT-MAGIC-COOKIE-1 key

I do a lot of search on the web. It looks like this problem is caused by the mpi. But why converting checkpoint needs a screen.

Expected behavior

Run well

actual behavior

See the above

additional notes

No

have you solved it? I got the same error.

corkiyao avatar Mar 05 '25 01:03 corkiyao