diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Symbol not defined on Debian VM

Open balintdecsi opened this issue 3 years ago • 3 comments

Describe the bug

I'm following the examples/dreambooth README, and after installing dependencies, I cannot configurate accelerate. When I run accelerate config or accelerate config default, it gives the following error: OSError: /opt/conda/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference, and then during handling this one, another one occures: OSError: /opt/conda/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference.

Reproduction

git clone https://github.com/huggingface/diffusers cd diffusers pip install -e . cd examples/dreambooth pip install -r requirements.txt accelerate config

Logs

Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 172, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/conda/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/conda/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 5, in <module>
    from accelerate.commands.accelerate_cli import main
  File "/opt/conda/lib/python3.7/site-packages/accelerate/__init__.py", line 7, in <module>
    from .accelerator import Accelerator
  File "/opt/conda/lib/python3.7/site-packages/accelerate/accelerator.py", line 25, in <module>
    import torch
  File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 217, in <module>
    _load_global_deps()
  File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 178, in _load_global_deps
    _preload_cuda_deps()
  File "/opt/conda/lib/python3.7/site-packages/torch/__init__.py", line 158, in _preload_cuda_deps
    ctypes.CDLL(cublas_path)
  File "/opt/conda/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/conda/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: symbol cublasLtGetStatusString version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time referenc

System Info

(I get the same error when running diffusers-cli env) This is my system info, on Google Cloud Platform, with CUDA 11.3 installed: Static hostname: debian Icon name: computer-vm Chassis: vm Virtualization: kvm Operating System: Debian GNU/Linux 10 (buster) Kernel: Linux 4.19.0-22-cloud-amd64 Architecture: x86-64

balintdecsi avatar Dec 18 '22 20:12 balintdecsi

Similar issue: https://github.com/huggingface/diffusers/issues/1271

balintdecsi avatar Dec 18 '22 20:12 balintdecsi

Resources

https://github.com/huggingface/accelerate/issues https://huggingface.co/docs/accelerate/basic_tutorials/install

If diffusers-cli env is not working you can use pip list to generate a list of installed packages to check what is available in your active environment.

averad avatar Dec 18 '22 22:12 averad

Hey @balintdecsi,

It looks a bit like your PyTorch version is incorrect, could you maybe check whether you can import it?

patrickvonplaten avatar Dec 19 '22 23:12 patrickvonplaten

Resources

https://github.com/huggingface/accelerate/issues https://huggingface.co/docs/accelerate/basic_tutorials/install

If diffusers-cli env is not working you can use pip list to generate a list of installed packages to check what is available in your active environment.

Sorry @averad for not having responded. Now I get the following output fo huggingface-cli env:

- huggingface_hub version: 0.11.1
- Platform: Linux-4.19.0-22-cloud-amd64-x86_64-with-debian-10.13
- Python version: 3.7.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /home/balint_decsi/.huggingface/token
- Has saved token ?: False
- Configured git credential helpers: 
- FastAI: N/A
- Tensorflow: N/A
- Torch: 1.13.1
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A

Does this say anything to you? Thanks.

balintdecsi avatar Jan 08 '23 17:01 balintdecsi

Hey @balintdecsi,

It looks a bit like your PyTorch version is incorrect, could you maybe check whether you can import it?

@patrickvonplaten you are right, I'm getting the same OSError as when running accelerate config. What can I do to fix this? Thanks.

balintdecsi avatar Jan 08 '23 17:01 balintdecsi

I would recommend reinstalling your PyTorch environment or maybe post an issue on PyTorch: https://github.com/pytorch/pytorch

patrickvonplaten avatar Jan 13 '23 11:01 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Feb 06 '23 15:02 github-actions[bot]

@balintdecsi I get the same issue on GCP, have you found a work-around or a fix? I've tried with different Pytorch environements aswell.

StateGovernment avatar Mar 17 '23 06:03 StateGovernment