yejingfu
yejingfu
Append "--use_custom_all_reduce disable" to trtllm-build command can fix it. This is working for me on 8x4090
I got the same issue if --workers is larger than 1, any update for fixing it?
I got the same issue when using DeepSeek-R1 by following NVIDIA official instruction: https://huggingface.co/nvidia/DeepSeek-R1-FP4#deploy-with-tensorrt-llm Should I manually build TensorRT-LLM from scratch? Why not NVIDIA release an official wheel or docker...
I tried the latest wheel(0.19.0) from https://pypi.nvidia.com/tensorrt-llm/. It can load DeepSeekV3 model, but failed on binding: Traceback (most recent call last): File "/codes/mlops/scripts/trtllm/offline.py", line 2, in from tensorrt_llm import SamplingParams...
The #9034 cannot fix the issue, I patched this PR but still reproduce the issue.