TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

[quantization][Mixtral-8x22B-v0.1] It will hang after [TensorRT-LLM][INFO] Initializing MPI with thread mode 1

Open Godlovecui opened this issue 1 year ago • 3 comments

System Info

L20, 8 cards, 8x48G memory, TensorRT-LLM version: 0.11.0.dev2024051400

Who can help?

@Tra

Information

  • [X] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

mpirun -n 8 python3 ../run.py --engine_dir ./engine_outputs_mixtral_8x22B_8gpu --tokenizer_dir /root/models/Mixtral-8x22B-v0.1 --max_output_len 200 --input_text "I love french quiche"

Expected behavior

It will run successfully in L20.

actual behavior

when I execute: mpirun -n 8 python3 ../run.py --engine_dir ./engine_outputs_mixtral_8x22B_8gpu --tokenizer_dir /root/models/Mixtral-8x22B-v0.1 --max_output_len 200 --input_text "I love french quiche" it will raise below error: image then I set environment TOKENIZERS_PARALLELISM=True or TOKENIZERS_PARALLELISM=False, it will hang as below: image how can I fix it? Thank you~

additional notes

image

Godlovecui avatar May 16 '24 08:05 Godlovecui

What docker image do you use? Also, could you share your transformers version? I cannot reproduce such issue on my side.

byshiue avatar May 22 '24 11:05 byshiue

What docker image do you use? Also, could you share your transformers version? I cannot reproduce such issue on my side.

  1. I built an environment in AutoDL cloud(https://www.autodl.com/market/list), because it does't support outside image, I selected a basic built-in image, follow the install instructions:

apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com python3 -c "import tensorrt_llm"

  1. transformer version is: transformers==4.38.2 Thank you~

Godlovecui avatar May 23 '24 07:05 Godlovecui

Could you try the official docker image on other machine first? Because it is hard to help if we cannot reproduce the issue on our side.

byshiue avatar May 27 '24 08:05 byshiue

OK, I will try official docker image in the future. This issue is closed temporarily. It will reopen if I encounter same question in the future. Thank you~~~

Godlovecui avatar Jun 02 '24 02:06 Godlovecui