Open-Sora icon indicating copy to clipboard operation
Open-Sora copied to clipboard

Instructions about installation

Open LYK-love opened this issue 1 year ago • 3 comments

Problem when installing apex

The installation instructions in here will get user in trouble if their CUDA version != 12.1.

That is because apex needs the PyTorch's corresponding CUDA version, which is usually 12.1, to exactly equal to the system CUDA version.

Well, precisely speaking, if you install PyTorch via https://pytorch.org/get-started/locally/, you will find: image

This means this pytorch package is compatible with CUDA==12.1. This is my case, you may have PyTorch compatible with other CUDA version, but here I take 12.1 as an example.

In my case, although my PyTorch's compatible version is 12.1, but my server's CUDA version is 12.3. So I instantly got an error when executing

pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" git+https://github.com/NVIDIA/apex.git

The error message is like:

  In some cases, a minor-version mismatch will not cause later errors:  https://github.com/NVIDIA/apex/pull/323#discussion_r287021798.  You can try commenting out this check (at your own risk).
  error: subprocess-exited-with-error

  × Building wheel for apex (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/lyk/miniconda3/envs/opensora/bin/python /home/lyk/miniconda3/envs/opensora/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp5kyv1i65
  cwd: /home/lyk/Projects/apex
  Building wheel for apex (pyproject.toml) ... error
  ERROR: Failed building wheel for apex
Failed to build apex
ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects

Solution 1

To solve this, we need to install CUDA version==12.1. I omit the process since you can google it. You will succeed if you get nvcc -V to output:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:16:06_PDT_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

Solution 2

However, there is another simpler way, just as minounou said, we can comment ./setup.py line 39-48, so that the "compatablity" check can be skipped.

LYK-love avatar Mar 19 '24 01:03 LYK-love

I encountered the same error when I was running other models, but that project did not have setup.py. Do you have any good suggestions?

ifredom avatar Mar 19 '24 10:03 ifredom

I encountered the same error when I was running other models, but that project did not have setup.py. Do you have any good suggestions?

In this case I think you can try Solution 1 to install another version of CUDA.

LYK-love avatar Mar 19 '24 21:03 LYK-love

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Mar 27 '24 01:03 github-actions[bot]