Runze Ma
Runze Ma
Hey, Here is my system environment setting for NCCL ```bash NCCL_SOCKET_IFNAME=en,eth,em,bond NCCL_P2P_DISABLE=0 LD_LABRARY_PATH=:/usr/local/cuda/lib64 NCCL_DEBUG=INFO ``` My PyTorch and CUDA version is `torch 1.8.0+cu111`. After disabling the CPU group, I can...
I guess something is wrong with the `GLOO_SOCKET_IFNAME`. Can I refer to your OS environment parameter setting about the `GLOO_SOCKET_IFNAME`, `NCCL_P2P_DISABLE` and `GLOO_SOCKET_IFNAME`?
OK, maybe I will make a pull request recently. Besides, I found the loaded file processed by the `save_pickle` is about 44GB in the memory. The memory usage is about...
I checked on macOS 14.4 and the issue does not exist.
My Pytorch version is `1.12.1` with `cuda 11.6`. As far as I know, Pytorch support the [CSR](https://pytorch.org/docs/stable/generated/torch.sparse_csr_tensor.html) format in recent version.
There is the same issue here. I can't find the package in the selected virtual environment unless I specify a particular environment variable in `pyproject.toml`.