FunCodec icon indicating copy to clipboard operation
FunCodec copied to clipboard

NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so)

Open huyuelin opened this issue 1 year ago • 0 comments

您好,当我在尝试多个gpu上训练时会遇到nccl库没有libnccl-net.so的报错,我已经确定我安装了版本为2.19.3的nccl库。而当我用单个gpu的训练时会遇到core dumped的问题。我目前用的是train-other-500 dev-other test-other这些数据集,请问是否对于这些报错有头绪

huyuelin avatar May 07 '24 07:05 huyuelin