Shakhizat Nurgaliyev

Results 20 comments of Shakhizat Nurgaliyev

Hi @WangFengtu1996, I highly recommend you to check it out the LLM jetson projects by @dusty-nv, especially the implementation of inference via MLC LLM(much faster than llama.cpp). You can find...

Hello @pavanimajety , that was very informative. I can share an alternative method using the Docker image from Nvidia: nvcr.io/nvidia/tritonserver:25.01-vllm-python-py3(https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver). I've tested it on a machine with a 5090 GPU,...

Hi @rubentorresbonet, vLLM is already installed with CUDA 12.8 support. Simply use the `vllm serve MODEL_NAME` command to start the inference engine.

Hello @pavanimajety , Thanks for your reply. If possible, could you please inform regarding the expected release of P2P support for the 5090 GPUs? I am currently unable to perform...

Thank you, @SmileIsThinking , for the suggestion. I attempted to build NCCL from source, but the issue persists. I also tried using `export NCCL_P2P_DISABLE=1`, which did not resolve the problem....

@rubentorresbonet please use this: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

Hello everyone, the Nvidia NCCL team released a new version of NCCL yesterday(https://github.com/NVIDIA/nccl/commit/f44ac759fee12ecb3cc6891e9e739a000f66fd70) . Please update it or apply a patch. This update allows for a TP of 2 to...

@jayavanth , A single RTX 5090 inference via vLLM should work anyways. Please update NCCL for multi-GPU inference for the RTX 50.. series GPUs . vLLM v1's issue is likely...

Hello Nvidia team, If possible, could you please suggest the correct build command for setting up S3 support? I also noticed that the NCCL package is not included as well....

Hi @oandreeva-nv, thanks for your reply. I've finally been able to build it, Here is a link to my blog post: https://www.hackster.io/shahizat/triton-inference-server-on-nvidia-jetson-using-k3s-and-minio-cbcfe3