bdf comments

Results 16 comments of

bdf

error when testing test_internode.sh deep_ep.cpp:83 'an illegal memory access was encountered'

> [@defei-coder](https://github.com/defei-coder) The latest code has changed the transport of normal kernel from IBRC to IBGDA, and these logs show that IBGDA is not functioning properly in your environment >...

Confuse about the estimated bandwidth on tests/test_internode.py

Hi @LyricZhao , I used tests/test_internode.py, the measured IB bandwidth using the original code under EP64 is 45 GB/s which closed to github performance, while the bandwidth tested with the...

Confuse about the estimated bandwidth on tests/test_internode.py

> If the tokens are evenly distributed, I guess it should be `45 * 7/8 = 39.375`? Anyway, there is still some space for optimization, we will refactor the code...

NCCL timeout while different ranks execute DeepEP and NCCL communications in different order

> DeepEP’s low-latency kernels use cooperative launch to attempt launching a large number of SMs simultaneously. If NCCL occupies some of the SMs, it may prevent DeepEP’s kernels from being...

NCCL timeout while different ranks execute DeepEP and NCCL communications in different order

A question about “If NCCL occupies some of the SMs, it may prevent DeepEP’s kernels from being launched“, Why can't deepEP wait for NCCL to finish？

NCCL timeout while different ranks execute DeepEP and NCCL communications in different order

Thaks for your reply @xiaofanl-nvidia . In my view, low_latency_dispatch differs from all_gather in that low_latency_dispatch accomplishes communication waiting through hook functions, eliminating the need for all ranks to execute...