poetry icon indicating copy to clipboard operation
poetry copied to clipboard

RHEL9(plow)/poetry not setting ld library paths correctly when installing PyTorch along with not installing a requirement(numpy)

Open neonine2 opened this issue 1 year ago • 6 comments

Description

When I try to add Pytorch and then import torch, I keep getting an error saying the libcudnn.so file is not found. Here is how to reproduce the error:

poetry new torch-newest cd torch-newest poetry add torch (which defaults to the newest 2.2.2) poetry shell python -c "import torch"

Traceback (most recent call last): File "", line 1, in File "/central/home/zwang2/torch-newest/.venv/lib64/python3.9/site-packages/torch/init.py", line 237, in from torch._C import * # noqa: F403 ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory

Workarounds

poetry new torch-newest cd torch-newest poetry add torch poetry add numpy export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/$USER/.cache/pypoetry/virtualenvs/torch-newest-bPtN3B6m-py3.9/lib/python3.9/site-packages/nvidia/cudnn/lib/:/home/$USER/.cache/pypoetry/virtualenvs/torch-newest-bPtN3B6m-py3.9/lib/python3.9/site-packages/nvidia/nccl/lib poetry shell python -c "import torch"

Poetry Installation Method

pip

Operating System

RHEL9

Poetry Version

1.8.2

Poetry Configuration

cache-dir = "/home/zwang2/.cache/pypoetry"
experimental.system-git-client = false
installer.max-workers = null
installer.modern-installation = true
installer.no-binary = null
installer.parallel = true
keyring.enabled = true
solver.lazy-wheel = true
virtualenvs.create = true
virtualenvs.in-project = null
virtualenvs.options.always-copy = false
virtualenvs.options.no-pip = false
virtualenvs.options.no-setuptools = false
virtualenvs.options.system-site-packages = false
virtualenvs.path = "{cache-dir}/virtualenvs"  # /home/zwang2/.cache/pypoetry/virtualenvs
virtualenvs.prefer-active-python = false
virtualenvs.prompt = "{project_name}-py{python_version}"
warnings.export = true

Python Sysconfig

No response

Example pyproject.toml

No response

Poetry Runtime Logs

Traceback (most recent call last):
File "", line 1, in
File "/central/home/zwang2/torch-newest/.venv/lib64/python3.9/site-packages/torch/init.py", line 237, in
from torch._C import * # noqa: F403
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory

neonine2 avatar Apr 10 '24 07:04 neonine2

it is not poetry's responsibility to set your environment variables. Actually it is not even within poetry's power.

dimbleby avatar Apr 10 '24 08:04 dimbleby

then could you explain why when i do not use poetry and just pip3 install torch, I don't see this issue?

neonine2 avatar Apr 10 '24 17:04 neonine2

no idea, but it is not to do with setting the LD_LIBRARY_PATH.

if you hope for someone to help you debug this then providing a way to reproduce it would be best, eg in a docker-ized form.

but maybe now that you know that that environment variable is a red herring you will have better luck digging into it yourself.

dimbleby avatar Apr 10 '24 17:04 dimbleby

I am not affiliated with Poetry but I use it daily. This doesn't feel like a Poetry issue. I am also a little curious about the order you've ran your commands. I am going to spin up a VM on my Prox host and test this to see what is happening so this can hopefully be closed.

swills1 avatar Apr 13 '24 00:04 swills1

From what I gather - you want to create a new Poetry environment and install torch. Which should install numpy and set the library path. I don't understand your order of operations. You create your environment, cd into it, then run poetry add but you're not in your shell and you didn't use poetry -c, but you later go into a shell and then use poetry -c.

I think these steps will achieve the same thing you're going for, and they work without issue in RHEL 9.3 for me as far as Poetry goes.

Update after OS install: sudo dnf update -y

Install pip sudo dnf install pip -y

Now I did get an error here related to pip but that is a different issue.

WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
  distutils: /home/zorro/.local/lib/python3.9/site-packages
  sysconfig: /home/zorro/.local/lib64/python3.9/site-packages
  WARNING: Additional context:
  user = True
  home = None
  root = None
  prefix = None

Install Poetry, create environment, install torch

python -m pip install poetry
poetry new torchster
cd torchster
poetry shell
poetry add torch torchvision

That installed all dependencies including numpy. If you're having trouble with the env variable after following these, it isn't going to be related to Poetry. Poetry's part in this is fine from my testing.

swills1 avatar Apr 13 '24 02:04 swills1

I am not trying to step on toes, but there hasn't been a response from the person who opened this in three weeks. It's hard to convey tone online. I say this with no ill-intent or condescension. I just feel like this can be put to bed and Poetry devs focus their time on other issues.

It seems clear to me that from the original post, this is not a Poetry issue. It is an order-of-operations issue. If @neonine2 would simply create the project, cd into it, do poetry shell, then begin doing poetry add - it would solve their problem.

I tested on Redhat 9.3 (see above), and Poetry worked without any issues. I genuinely feel this can be closed. I apologize if I am over-stepping.

swills1 avatar May 03 '24 19:05 swills1

Sorry I just haven't gotten the chance to test your solution, I will close it now.

neonine2 avatar May 08 '24 23:05 neonine2

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions[bot] avatar Jun 08 '24 00:06 github-actions[bot]