使用作者给出的语句安装环境有错误,有人知道如何解决吗
Open
FreeGeans
opened this issue 3 months ago
•
0 comments
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train_stage2.py FAILED
Failures:
[1]:
time : 2025-10-13_16:22:09
host : tyut-PowerEdge-R750
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 163472)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure):
[0]:
time : 2025-10-13_16:22:09
host : tyut-PowerEdge-R750
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 163471)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html