yejingfu comments

Results 5 comments of


                                            yejingfu

"peer access is not supported between these two devices" when using multiple GPUs

Append "--use_custom_all_reduce disable" to trtllm-build command can fix it. This is working for me on 8x4090

add '--workers' args for tools/api_server.py,i cannot run it

I got the same issue if --workers is larger than 1, any update for fixing it?

【Error】ValueError: Unknown architecture for AutoModelForCausalLM: DeepseekV3ForCausalLM

I got the same issue when using DeepSeek-R1 by following NVIDIA official instruction: https://huggingface.co/nvidia/DeepSeek-R1-FP4#deploy-with-tensorrt-llm Should I manually build TensorRT-LLM from scratch? Why not NVIDIA release an official wheel or docker...

【Error】ValueError: Unknown architecture for AutoModelForCausalLM: DeepseekV3ForCausalLM

I tried the latest wheel(0.19.0) from https://pypi.nvidia.com/tensorrt-llm/. It can load DeepSeekV3 model, but failed on binding: Traceback (most recent call last): File "/codes/mlops/scripts/trtllm/offline.py", line 2, in from tensorrt_llm import SamplingParams...

[Bug]: AssertionError when using automatic prefix caching and prompt_logprobs

The #9034 cannot fix the issue, I patched this PR but still reproduce the issue.