YangYuxuan

Results 5 comments of YangYuxuan

Hi, thanks for the work, i have a problem for no-peer-access environments here. ``` (WorkerDict pid=2890081) [2025-03-13 15:00:53 TP0] Scheduler hit an exception: Traceback (most recent call last): (WorkerDict pid=2890081)...

@fzyzcjy Thanks for the explanation, i guess that is the root of the issue. I disabled the overrides of the CUDA_VISIBLE_DEVICES, and the error is resolved. I think the problem...

@fzyzcjy #### 1. Disabling Overrides and this PR https://github.com/pytorch/pytorch/pull/149248 This what i meant by disabling overrides of CUDA_VISIBLE_DEVICES. I actually do not alter ray's device isolation. And this current fix...

@fzyzcjy BTW, the peer access error appears regardless of different devices, I guess it is not a device issue.

@fzyzcjy Thanks!! The script runs just fine in the container. I manage to reproduce and fix the problem in my environment, it turns out that the torch-memory-saver is wrongly installed...