ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects
Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10) git version 2.34.1 torch.version = 2.2.1+cu121
Compiling cuda extensions with nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release 12.2, V12.2.140 Build cuda_12.2.r12.2/compiler.33191640_0 from /usr/local/cuda/bin
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
× Building wheel for apex (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip. full command: /usr/bin/python3 /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpvipwq2mw cwd: /tmp/pip-req-build-isqlmxnv Building wheel for apex (pyproject.toml) ... error ERROR: Failed building wheel for apex Failed to build apex ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects
ModuleNotFoundError: No module named 'fused_layer_norm_cuda' When running the inference code.
Still could not resolve the question following the method in: "https://github.com/NVIDIA/apex/issues/1653". Tried on both server and colab.
This is a common error due to your systems' global Nvidia driver (12.2) and pytorch cuda (12.1) version mismatch. You should comment out this
This is a common error due to your systems' global Nvidia driver (12.2) and pytorch cuda (12.1) version mismatch. You should comment out this
Then it comes with another error:
[1/1] c++ -MMD -MF /data1/ouyangtianjian/apex-22.04-dev/build/temp.linux-x86_64-cpython-310/csrc/flatten_unflatten.o.d -pthread -B /data1/ouyangtianjian/.conda/envs/opensora/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /data1/ouyangtianjian/.conda/envs/opensora/include -fPIC -O2 -isystem /data1/ouyangtianjian/.conda/envs/opensora/include -fPIC -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/TH -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/THC -I/data1/ouyangtianjian/.conda/envs/opensora/include/python3.10 -c -c /data1/ouyangtianjian/apex-22.04-dev/csrc/flatten_unflatten.cpp -o /data1/ouyangtianjian/apex-22.04-dev/build/temp.linux-x86_64-cpython-310/csrc/flatten_unflatten.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=apex_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17 g++ -pthread -B /data1/ouyangtianjian/.conda/envs/opensora/compiler_compat -shared -Wl,-rpath,/data1/ouyangtianjian/.conda/envs/opensora/lib -Wl,-rpath-link,/data1/ouyangtianjian/.conda/envs/opensora/lib -L/data1/ouyangtianjian/.conda/envs/opensora/lib -Wl,-rpath,/data1/ouyangtianjian/.conda/envs/opensora/lib -Wl,-rpath-link,/data1/ouyangtianjian/.conda/envs/opensora/lib -L/data1/ouyangtianjian/.conda/envs/opensora/lib /data1/ouyangtianjian/apex-22.04-dev/build/temp.linux-x86_64-cpython-310/csrc/flatten_unflatten.o -L/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/apex_C.cpython-310-x86_64-linux-gnu.so building 'amp_C' extension Emitting ninja build file /data1/ouyangtianjian/apex-22.04-dev/build/temp.linux-x86_64-cpython-310/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/14] /data1/ouyangtianjian/.conda/envs/opensora/bin/nvcc -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/TH -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/THC -I/data1/ouyangtianjian/.conda/envs/opensora/include -I/data1/ouyangtianjian/.conda/envs/opensora/include/python3.10 -c -c /data1/ouyangtianjian/apex-22.04-dev/csrc/multi_tensor_novograd.cu -o /data1/ouyangtianjian/apex-22.04-dev/build/temp.linux-x86_64-cpython-310/csrc/multi_tensor_novograd.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="gcc"' '-DPYBIND11_STDLIB="libstdcpp"' '-DPYBIND11_BUILD_ABI="cxxabi1011"' -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 FAILED: /data1/ouyangtianjian/apex-22.04-dev/build/temp.linux-x86_64-cpython-310/csrc/multi_tensor_novograd.o /data1/ouyangtianjian/.conda/envs/opensora/bin/nvcc -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/TH -I/data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/THC -I/data1/ouyangtianjian/.conda/envs/opensora/include -I/data1/ouyangtianjian/.conda/envs/opensora/include/python3.10 -c -c /data1/ouyangtianjian/apex-22.04-dev/csrc/multi_tensor_novograd.cu -o /data1/ouyangtianjian/apex-22.04-dev/build/temp.linux-x86_64-cpython-310/csrc/multi_tensor_novograd.o -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -lineinfo -O3 --use_fast_math -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=amp_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 -std=c++17 In file included from /data1/ouyangtianjian/apex-22.04-dev/csrc/multi_tensor_novograd.cu:3: /data1/ouyangtianjian/.conda/envs/opensora/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:6:10: fatal error: cusparse.h: No such file or directory 6 | #include <cusparse.h> | ^~~~~~~~~~~~ compilation terminated.
Try reinstalling your system nv driver to the same version?
Try reinstalling your system nv driver to the same version?
The sad news is the server manager rufuse to modify nv driver version (now is 12.2) because lots of people are using the GPU. And it seems that pytorch for CUDA 12.2 hasn't been released. Anyway, still thank you for your help.
This is a common error due to your systems' global Nvidia driver (12.2) and pytorch cuda (12.1) version mismatch. You should comment out this
Finally found the solution: https://github.com/NVIDIA/apex/pull/323#discussion_r287021798
I should not comment out the whole function. Only "if (bare_metal_major != torch_binary_major) or (bare_metal_minor != torch_binary_minor):" part needs to be deleted.
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.

