Symbol not defined on Debian VM
Describe the bug
I'm following the examples/dreambooth README, and after installing dependencies, I cannot configurate accelerate. When I run accelerate config or accelerate config default, it gives the following error:
OSError: /opt/conda/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference, and then during handling this one, another one occures:
OSError: /opt/conda/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference.
Reproduction
git clone https://github.com/huggingface/diffusers cd diffusers pip install -e . cd examples/dreambooth pip install -r requirements.txt accelerate config
Logs
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/opt/conda/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /opt/conda/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/bin/accelerate", line 5, in <module>
from accelerate.commands.accelerate_cli import main
File "/opt/conda/lib/python3.7/site-packages/accelerate/__init__.py", line 7, in <module>
from .accelerator import Accelerator
File "/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py", line 25, in <module>
import torch
File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 217, in <module>
_load_global_deps()
File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 178, in _load_global_deps
_preload_cuda_deps()
File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File "/opt/conda/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /opt/conda/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time referenc
System Info
(I get the same error when running diffusers-cli env)
This is my system info, on Google Cloud Platform, with CUDA 11.3 installed:
Static hostname: debian Icon name: computer-vm Chassis: vm Virtualization: kvm Operating System: Debian GNU/Linux 10 (buster) Kernel: Linux 4.19.0-22-cloud-amd64 Architecture: x86-64
Similar issue: https://github.com/huggingface/diffusers/issues/1271
Resources
https://github.com/huggingface/accelerate/issues https://huggingface.co/docs/accelerate/basic_tutorials/install
If diffusers-cli env is not working you can use pip list to generate a list of installed packages to check what is available in your active environment.
Hey @balintdecsi,
It looks a bit like your PyTorch version is incorrect, could you maybe check whether you can import it?
Resources
https://github.com/huggingface/accelerate/issues https://huggingface.co/docs/accelerate/basic_tutorials/install
If
diffusers-cli envis not working you can usepip listto generate a list of installed packages to check what is available in your active environment.
Sorry @averad for not having responded. Now I get the following output fo huggingface-cli env:
- huggingface_hub version: 0.11.1
- Platform: Linux-4.19.0-22-cloud-amd64-x86_64-with-debian-10.13
- Python version: 3.7.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/balint_decsi/.huggingface/token
- Has saved token ?: False
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: N/A
- Torch: 1.13.1
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
Does this say anything to you? Thanks.
Hey @balintdecsi,
It looks a bit like your PyTorch version is incorrect, could you maybe check whether you can import it?
@patrickvonplaten you are right, I'm getting the same OSError as when running accelerate config. What can I do to fix this? Thanks.
I would recommend reinstalling your PyTorch environment or maybe post an issue on PyTorch: https://github.com/pytorch/pytorch
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
@balintdecsi I get the same issue on GCP, have you found a work-around or a fix? I've tried with different Pytorch environements aswell.