Shangyan Zhou

Results 132 comments of Shangyan Zhou

@Wenhan-Tan I just encountered the same issue. The reason I ran into this problem was that I had enabled hugepages on the physical machine, and UCX triggered a SIGBUS when...

> @sphish Thank you! I saw another similar issue here ([NVIDIA/TensorRT-LLM#674](https://github.com/NVIDIA/TensorRT-LLM/issues/674)) which uses TRT-LLM instead of FT. But in that issue, huge pages need be enabled. I'll try disabling huge...

What is your network hardware configuration? Could you please run `nvidia-smi topo -mp` and `ibv_devinfo` and share the results?

> I'm seeing a similar issue: > > ``` > root@22f186c3783d:/workspace# > root@22f186c3783d:/workspace# nvidia-smi topo -mp > GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4...

> [@sphish](https://github.com/sphish) Same issue. Any help? @liusy58 Can you run the NVSHMEM's `shmem_put_bw` test, and will you encounter the same issue?

> [@sphish](https://github.com/sphish) Hi, output of `shmem_put_bw` is shown below. I cannot resolve this, could you please give me some guidance? > > ``` > /opt/nvshmem/bin/perftest/device/pt-to-pt/shmem_put_bw > Runtime options after parsing...

> > > [@sphish](https://github.com/sphish) Same issue. Any help? > > > > > > [@liusy58](https://github.com/liusy58) Can you run the NVSHMEM's `shmem_put_bw` test, and will you encounter the same issue? >...

@koanho Can you check if the nvidia-peermem module is correctly installed and loaded?

@koanho Have you modified drvier config? https://github.com/deepseek-ai/DeepEP/tree/main/third-party#4-configure-nvidia-driver

> Is IBGDA necessary to use DeepEP, right? @koanho If you want to use low latency mode, Yes. If you only want to use the normal mode for training, you...