TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

Failed to build transformer-engine

Open sfwu2003 opened this issue 11 months ago • 8 comments

Python 3.12.7 pytorch: 2.6.0+cu126 cuda: 12.6 cudnn 9.3.0.75 gcc: 13.3.0 RTX4090 Ubuntu

have export the path already

pip install transformer_engine[pytorch] Defaulting to user installation because normal site-packages is not writeable Collecting transformer_engine[pytorch] Using cached transformer_engine-1.13.0-py3-none-any.whl.metadata (16 kB) Collecting transformer_engine_cu12==1.13.0 (from transformer_engine[pytorch]) Using cached transformer_engine_cu12-1.13.0-py3-none-manylinux_2_28_x86_64.whl.metadata (16 kB) Collecting transformer_engine_torch==1.13.0 (from transformer_engine[pytorch]) Downloading transformer_engine_torch-1.13.0.tar.gz (121 kB) Preparing metadata (setup.py) ... done Requirement already satisfied: pydantic in /workspace/shared/anaconda3/lib/python3.12/site-packages (from transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (2.8.2) Requirement already satisfied: importlib-metadata>=1.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (7.0.1) Requirement already satisfied: packaging in /workspace/shared/anaconda3/lib/python3.12/site-packages (from transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (24.1) Requirement already satisfied: torch in ./.local/lib/python3.12/site-packages (from transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2.6.0+cu126) Requirement already satisfied: zipp>=0.5 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from importlib-metadata>=1.0->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (3.17.0) Requirement already satisfied: annotated-types>=0.4.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from pydantic->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (0.6.0) Requirement already satisfied: pydantic-core==2.20.1 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from pydantic->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (2.20.1) Requirement already satisfied: typing-extensions>=4.6.1 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from pydantic->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (4.11.0) Requirement already satisfied: filelock in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.13.1) Requirement already satisfied: setuptools in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (75.1.0) Requirement already satisfied: sympy==1.13.1 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (1.13.1) Requirement already satisfied: networkx in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.3) Requirement already satisfied: jinja2 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.1.4) Requirement already satisfied: fsspec in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2024.6.1) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.77) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.77) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.80) Requirement already satisfied: nvidia-cudnn-cu12==9.5.1.17 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (9.5.1.17) Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.4.1) Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (11.3.0.4) Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (10.3.7.77) Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (11.7.1.2) Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.5.4.2) Requirement already satisfied: nvidia-cusparselt-cu12==0.6.3 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (0.6.3) Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2.21.5) Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.77) Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.85) Requirement already satisfied: triton==3.2.0 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.2.0) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from sympy==1.13.1->torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from jinja2->torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2.1.3) Using cached transformer_engine_cu12-1.13.0-py3-none-manylinux_2_28_x86_64.whl (125.2 MB) Using cached transformer_engine-1.13.0-py3-none-any.whl (459 kB) Building wheels for collected packages: transformer_engine_torch Building wheel for transformer_engine_torch (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [22 lines of output] /workspace/shared/anaconda3/lib/python3.12/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'tests_require' warnings.warn(msg) running bdist_wheel /workspace/jmwang/.local/lib/python3.12/site-packages/torch/utils/cpp_extension.py:529: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_ext /workspace/jmwang/.local/lib/python3.12/site-packages/torch/utils/cpp_extension.py:458: UserWarning: There are no g++ version bounds defined for CUDA version 12.6 warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}') building 'transformer_engine_torch' extension creating build/temp.linux-x86_64-cpython-312/csrc creating build/temp.linux-x86_64-cpython-312/csrc/extensions creating build/temp.linux-x86_64-cpython-312/csrc/extensions/multi_tensor g++ -pthread -B /workspace/shared/anaconda3/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /workspace/shared/anaconda3/include -fPIC -O2 -isystem /workspace/shared/anaconda3/include -fPIC -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/common_headers -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/common_headers/common -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/common_headers/common/include -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/csrc -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/TH -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/THC -I/usr/local/cuda/include -I/workspace/shared/anaconda3/include/python3.12 -c csrc/common.cpp -o build/temp.linux-x86_64-cpython-312/csrc/common.o -O3 -fvisibility=hidden -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1016" -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17 In file included from /workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/ATen/cudnn/Handle.h:4, from csrc/common.h:14, from csrc/common.cpp:7: /workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/ATen/cudnn/cudnn-wrapper.h:3:10: fatal error: cudnn.h: No such file or directory 3 | #include <cudnn.h> | ^~~~~~~~~ compilation terminated. error: command '/usr/bin/g++' failed with exit code 1 [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer_engine_torch Running setup.py clean for transformer_engine_torch Failed to build transformer_engine_torch ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine_torch)

sfwu2003 avatar Feb 25 '25 17:02 sfwu2003

Hi @sfwu2003, how did you install cuDNN? @ksivaman for visibility - that is installation from wheels.

ptrendx avatar Feb 25 '25 22:02 ptrendx

Same issue when installing.

OS: Ubuntu 22.04 RTX 3060 Python 3.12.9 conda list | grep cudnn nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi torch 2.6.0 pypi_0 pypi

pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable pip install transformer_engine[pytorch]

      self.run_command(cmd_name)
    File "/home/skr/miniconda3/envs/cosmos/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 339, in run_command
      self.distribution.run_command(command)
    File "/home/skr/miniconda3/envs/cosmos/lib/python3.12/site-packages/setuptools/dist.py", line 999, in run_command
      super().run_command(command)
    File "/home/skr/miniconda3/envs/cosmos/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1002, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-nehutrdg/build_tools/build_ext.py", line 119, in run
      ext._build_cmake(
    File "/tmp/pip-req-build-nehutrdg/build_tools/build_ext.py", line 91, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/home/skr/miniconda3/envs/cosmos/lib/python3.12/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-nehutrdg/transformer_engine/common', '-B', '/tmp/pip-req-build-nehutrdg/build/cmake', '-DPython_EXECUTABLE=/home/skr/miniconda3/envs/cosmos/bin/python', '-DPython_INCLUDE_DIR=/home/skr/miniconda3/envs/cosmos/include/python3.12', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-nehutrdg/build/lib.linux-x86_64-cpython-312', '-DCMAKE_CUDA_ARCHITECTURES=70;80;89;90;100;120', '-Dpybind11_DIR=/tmp/pip-req-build-nehutrdg/.eggs/pybind11-2.13.6-py3.12.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer_engine Running setup.py clean for transformer_engine Failed to build transformer_engine ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine)

def _load_library(): """Load shared library with Transformer Engine C extensions""" module_name = "transformer_engine_torch" if is_package_installed(module_name): assert is_package_installed("transformer_engine"), "Could not find transformer-engine." assert is_package_installed( "transformer_engine_cu12" ), "Could not find transformer-engine-cu12." assert ( version(module_name) == version("transformer-engine") == version("transformer-engine-cu12") ), ( "TransformerEngine package version mismatch. Found" f" {module_name} v{version(module_name)}, transformer-engine" f" v{version('transformer-engine')}, and transformer-engine-cu12" f" v{version('transformer-engine-cu12')}. Install transformer-engine using 'pip install" " transformer-engine[pytorch]==VERSION'" ) if is_package_installed("transformer-engine-cu12"): if not is_package_installed(module_name): logging.info( "Could not find package %s. Install transformer-engine using 'pip" " install transformer-engine[pytorch]==VERSION'", module_name, ) extension = _get_sys_extension() try: so_dir = get_te_path() / "transformer_engine" so_path = next(so_dir.glob(f"{module_name}..{extension}")) except StopIteration: so_dir = get_te_path() so_path = next(so_dir.glob(f"{module_name}..{extension}")) spec = importlib.util.spec_from_file_location(module_name, so_path) solib = importlib.util.module_from_spec(spec) sys.modules[module_name] = solib spec.loader.exec_module(solib)

nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2025 NVIDIA Corporation Built on Wed_Jan_15_19:20:09_PST_2025 Cuda compilation tools, release 12.8, V12.8.61 Build cuda_12.8.r12.8/compiler.35404655_0

nvidia-smi Sun Mar 2 00:32:30 2025
+-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 570.86.10 Driver Version: 570.86.10 CUDA Version: 12.8 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 On | N/A | | 0% 48C P8 24W / 170W | 592MiB / 12288MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1626 G /usr/lib/xorg/Xorg 269MiB | | 0 N/A N/A 1900 G /usr/bin/gnome-shell 118MiB | | 0 N/A N/A 5148 G ...ess --variations-seed-version 144MiB | +-----------------------------------------------------------------------------------------+

skr3178 avatar Mar 01 '25 23:03 skr3178

The error in the original issue indicates a problem with finding the cuDNN headers:

/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/ATen/cudnn/cudnn-wrapper.h:3:10: fatal error: cudnn.h: No such file or directory

@skr3178 could you confirm that you are seeing the same error (it should be above the lines you posted)?

ptrendx avatar Mar 04 '25 00:03 ptrendx

@ptrendx I'm having the same issue as the OP with the cuDNN headers. Here's a reproducible example:

conda create --name tr_engine \
 python=3.10 \
 nvidia/label/cuda-12.6.3::cuda \
 nvidia::cudnn

conda activate tr_engine

pip install torch --index-url https://download.pytorch.org/whl/cu126

export CUDA_HOME=$CONDA_PREFIX
export NVTE_FRAMEWORK=pytorch
pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable --verbose

This gives:

  ~/miniconda3/envs/tr_engine/bin/x86_64-conda-linux-gnu-c++ -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-_h4f8bra/transformer_engine/common/.. -I/tmp/pip-req-build-_h4f8bra/transformer_engine/common/include -I/tmp/pip-req-build-_h4f8bra/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-_h4f8bra/build/cmake/string_headers -isystem /lustre/fsw/portfolios/llmservice/users/cmccarthy/miniconda3/envs/tr_engine/targets/x86_64-linux/include -Wl,--version-script=/tmp/pip-req-build-_h4f8bra/transformer_engine/common/libtransformer_engine.version -O3 -DNDEBUG -std=gnu++17 -fPIC -MD -MT CMakeFiles/transformer_engine.dir/comm_gemm_overlap/comm_gemm_overlap.cpp.o -MF CMakeFiles/transformer_engine.dir/comm_gemm_overlap/comm_gemm_overlap.cpp.o.d -o CMakeFiles/transformer_engine.dir/comm_gemm_overlap/comm_gemm_overlap.cpp.o -c /tmp/pip-req-build-_h4f8bra/transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
  In file included from /tmp/pip-req-build-_h4f8bra/transformer_engine/common/normalization/common.cpp:9:
  /tmp/pip-req-build-_h4f8bra/transformer_engine/common/normalization/common.h:10:10: fatal error: cudnn.h: No such file or directory
     10 | #include <cudnn.h>
        |          ^~~~~~~~~
  In file included from /tmp/pip-req-build-_h4f8bra/transformer_engine/common/cudnn_utils.cpp:7:
  /tmp/pip-req-build-_h4f8bra/transformer_engine/common/cudnn_utils.h:10:10: fatal error: cudnn.h: No such file or directory
     10 | #include <cudnn.h>
        |          ^~~~~~~~~
  compilation terminated.
  compilation terminated.

But the file exists at the standard path (I'm assuming $CUDA_HOME/include is standard)

(tr_engine) ~/$ cd $CUDA_HOME && find . -name cudnn.h
./lib/python3.10/site-packages/nvidia/cudnn/include/cudnn.h
./include/cudnn.h

Explicitly adding export NVTE_CUDA_INCLUDE_PATH=$CUDA_HOME/include doesn't work either.

I've attached the full pip install / build output.

Thanks for taking a look.

transformer_engine_build_out.txt

collinmccarthy avatar Mar 09 '25 01:03 collinmccarthy

At the moment, TE relies on its 3rd party module cudnn-frontend's cmake file to look for cuDNN. It takes a few HINTs but maybe those hints don't cover all the situations above. Could you please try adding CUDNN_PATH=<path to/cudnn/include> pip install ... to your installation command?

https://github.com/NVIDIA/cudnn-frontend/blob/5040925e9450c399a66240b485b38564226e1212/cmake/cuDNN.cmake#L5

cyanguwa avatar Mar 11 '25 20:03 cyanguwa

@cyanguwa I appreciate the help but no luck...

collinmccarthy avatar Mar 12 '25 02:03 collinmccarthy

NVTE_CUDA_INCLUDE_PATH

@collinmccarthy FYI, setting "CUDNN_PATH" worked for me (on slurm server, if you also happen to be using it), as mentioned in Compiling on Slurmcluster fatal error: cudnn.h: No such file or directory

export CUDNN_PATH=/path/to/cudnn

xavhl avatar Apr 05 '25 20:04 xavhl

At the moment, TE relies on its 3rd party module cudnn-frontend's cmake file to look for cuDNN. It takes a few HINTs but maybe those hints don't cover all the situations above. Could you please try adding CUDNN_PATH=<path to/cudnn/include> pip install ... to your installation command?

https://github.com/NVIDIA/cudnn-frontend/blob/5040925e9450c399a66240b485b38564226e1212/cmake/cuDNN.cmake#L5

Hello, I'd like to ask why hint cudnn in CUDNN_PATH instead of CUDNN_PATH/include? Is CUDNN_PATH probably xxx/envs/torch27py312/lib/python3.1/site-packages/nvidia/cudnn/ or xxx/envs/torch27py312/lib/python3.1/site-packages/nvidia/cudnn/include ?

wplf avatar Aug 13 '25 13:08 wplf

maybe export CPLUS_INCLUDE_PATH=/share/home/xxxxx/envs/torch27py312/lib/python3.1/site-packages/nvidia/cudnn/include will help you.

wplf avatar Aug 13 '25 13:08 wplf

Thanks @wplf

rupaut98 avatar Sep 22 '25 04:09 rupaut98