Failed to build transformer-engine
Python 3.12.7 pytorch: 2.6.0+cu126 cuda: 12.6 cudnn 9.3.0.75 gcc: 13.3.0 RTX4090 Ubuntu
have export the path already
pip install transformer_engine[pytorch] Defaulting to user installation because normal site-packages is not writeable Collecting transformer_engine[pytorch] Using cached transformer_engine-1.13.0-py3-none-any.whl.metadata (16 kB) Collecting transformer_engine_cu12==1.13.0 (from transformer_engine[pytorch]) Using cached transformer_engine_cu12-1.13.0-py3-none-manylinux_2_28_x86_64.whl.metadata (16 kB) Collecting transformer_engine_torch==1.13.0 (from transformer_engine[pytorch]) Downloading transformer_engine_torch-1.13.0.tar.gz (121 kB) Preparing metadata (setup.py) ... done Requirement already satisfied: pydantic in /workspace/shared/anaconda3/lib/python3.12/site-packages (from transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (2.8.2) Requirement already satisfied: importlib-metadata>=1.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (7.0.1) Requirement already satisfied: packaging in /workspace/shared/anaconda3/lib/python3.12/site-packages (from transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (24.1) Requirement already satisfied: torch in ./.local/lib/python3.12/site-packages (from transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2.6.0+cu126) Requirement already satisfied: zipp>=0.5 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from importlib-metadata>=1.0->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (3.17.0) Requirement already satisfied: annotated-types>=0.4.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from pydantic->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (0.6.0) Requirement already satisfied: pydantic-core==2.20.1 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from pydantic->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (2.20.1) Requirement already satisfied: typing-extensions>=4.6.1 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from pydantic->transformer_engine_cu12==1.13.0->transformer_engine[pytorch]) (4.11.0) Requirement already satisfied: filelock in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.13.1) Requirement already satisfied: setuptools in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (75.1.0) Requirement already satisfied: sympy==1.13.1 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (1.13.1) Requirement already satisfied: networkx in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.3) Requirement already satisfied: jinja2 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.1.4) Requirement already satisfied: fsspec in /workspace/shared/anaconda3/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2024.6.1) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.6.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.77) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.6.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.77) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.6.80 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.80) Requirement already satisfied: nvidia-cudnn-cu12==9.5.1.17 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (9.5.1.17) Requirement already satisfied: nvidia-cublas-cu12==12.6.4.1 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.4.1) Requirement already satisfied: nvidia-cufft-cu12==11.3.0.4 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (11.3.0.4) Requirement already satisfied: nvidia-curand-cu12==10.3.7.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (10.3.7.77) Requirement already satisfied: nvidia-cusolver-cu12==11.7.1.2 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (11.7.1.2) Requirement already satisfied: nvidia-cusparse-cu12==12.5.4.2 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.5.4.2) Requirement already satisfied: nvidia-cusparselt-cu12==0.6.3 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (0.6.3) Requirement already satisfied: nvidia-nccl-cu12==2.21.5 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2.21.5) Requirement already satisfied: nvidia-nvtx-cu12==12.6.77 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.77) Requirement already satisfied: nvidia-nvjitlink-cu12==12.6.85 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (12.6.85) Requirement already satisfied: triton==3.2.0 in ./.local/lib/python3.12/site-packages (from torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (3.2.0) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from sympy==1.13.1->torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (1.3.0) Requirement already satisfied: MarkupSafe>=2.0 in /workspace/shared/anaconda3/lib/python3.12/site-packages (from jinja2->torch->transformer_engine_torch==1.13.0->transformer_engine[pytorch]) (2.1.3) Using cached transformer_engine_cu12-1.13.0-py3-none-manylinux_2_28_x86_64.whl (125.2 MB) Using cached transformer_engine-1.13.0-py3-none-any.whl (459 kB) Building wheels for collected packages: transformer_engine_torch Building wheel for transformer_engine_torch (setup.py) ... error error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [22 lines of output] /workspace/shared/anaconda3/lib/python3.12/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'tests_require' warnings.warn(msg) running bdist_wheel /workspace/jmwang/.local/lib/python3.12/site-packages/torch/utils/cpp_extension.py:529: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_ext /workspace/jmwang/.local/lib/python3.12/site-packages/torch/utils/cpp_extension.py:458: UserWarning: There are no g++ version bounds defined for CUDA version 12.6 warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}') building 'transformer_engine_torch' extension creating build/temp.linux-x86_64-cpython-312/csrc creating build/temp.linux-x86_64-cpython-312/csrc/extensions creating build/temp.linux-x86_64-cpython-312/csrc/extensions/multi_tensor g++ -pthread -B /workspace/shared/anaconda3/compiler_compat -fno-strict-overflow -Wsign-compare -DNDEBUG -O2 -Wall -fPIC -O2 -isystem /workspace/shared/anaconda3/include -fPIC -O2 -isystem /workspace/shared/anaconda3/include -fPIC -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/common_headers -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/common_headers/common -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/common_headers/common/include -I/tmp/pip-install-d8mpwx1x/transformer-engine-torch_84f4d864065842a4a131c88cea3e6872/csrc -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/TH -I/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/THC -I/usr/local/cuda/include -I/workspace/shared/anaconda3/include/python3.12 -c csrc/common.cpp -o build/temp.linux-x86_64-cpython-312/csrc/common.o -O3 -fvisibility=hidden -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1016" -DTORCH_EXTENSION_NAME=transformer_engine_torch -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17 In file included from /workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/ATen/cudnn/Handle.h:4, from csrc/common.h:14, from csrc/common.cpp:7: /workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/ATen/cudnn/cudnn-wrapper.h:3:10: fatal error: cudnn.h: No such file or directory 3 | #include <cudnn.h> | ^~~~~~~~~ compilation terminated. error: command '/usr/bin/g++' failed with exit code 1 [end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer_engine_torch Running setup.py clean for transformer_engine_torch Failed to build transformer_engine_torch ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine_torch)
Hi @sfwu2003, how did you install cuDNN? @ksivaman for visibility - that is installation from wheels.
Same issue when installing.
OS: Ubuntu 22.04 RTX 3060 Python 3.12.9 conda list | grep cudnn nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi torch 2.6.0 pypi_0 pypi
pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable
pip install transformer_engine[pytorch]
self.run_command(cmd_name)
File "/home/skr/miniconda3/envs/cosmos/lib/python3.12/site-packages/setuptools/_distutils/cmd.py", line 339, in run_command
self.distribution.run_command(command)
File "/home/skr/miniconda3/envs/cosmos/lib/python3.12/site-packages/setuptools/dist.py", line 999, in run_command
super().run_command(command)
File "/home/skr/miniconda3/envs/cosmos/lib/python3.12/site-packages/setuptools/_distutils/dist.py", line 1002, in run_command
cmd_obj.run()
File "/tmp/pip-req-build-nehutrdg/build_tools/build_ext.py", line 119, in run
ext._build_cmake(
File "/tmp/pip-req-build-nehutrdg/build_tools/build_ext.py", line 91, in _build_cmake
raise RuntimeError(f"Error when running CMake: {e}")
RuntimeError: Error when running CMake: Command '['/home/skr/miniconda3/envs/cosmos/lib/python3.12/site-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-nehutrdg/transformer_engine/common', '-B', '/tmp/pip-req-build-nehutrdg/build/cmake', '-DPython_EXECUTABLE=/home/skr/miniconda3/envs/cosmos/bin/python', '-DPython_INCLUDE_DIR=/home/skr/miniconda3/envs/cosmos/include/python3.12', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-nehutrdg/build/lib.linux-x86_64-cpython-312', '-DCMAKE_CUDA_ARCHITECTURES=70;80;89;90;100;120', '-Dpybind11_DIR=/tmp/pip-req-build-nehutrdg/.eggs/pybind11-2.13.6-py3.12.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer_engine Running setup.py clean for transformer_engine Failed to build transformer_engine ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine)
def _load_library(): """Load shared library with Transformer Engine C extensions""" module_name = "transformer_engine_torch" if is_package_installed(module_name): assert is_package_installed("transformer_engine"), "Could not find transformer-engine." assert is_package_installed( "transformer_engine_cu12" ), "Could not find transformer-engine-cu12." assert ( version(module_name) == version("transformer-engine") == version("transformer-engine-cu12") ), ( "TransformerEngine package version mismatch. Found" f" {module_name} v{version(module_name)}, transformer-engine" f" v{version('transformer-engine')}, and transformer-engine-cu12" f" v{version('transformer-engine-cu12')}. Install transformer-engine using 'pip install" " transformer-engine[pytorch]==VERSION'" ) if is_package_installed("transformer-engine-cu12"): if not is_package_installed(module_name): logging.info( "Could not find package %s. Install transformer-engine using 'pip" " install transformer-engine[pytorch]==VERSION'", module_name, ) extension = _get_sys_extension() try: so_dir = get_te_path() / "transformer_engine" so_path = next(so_dir.glob(f"{module_name}..{extension}")) except StopIteration: so_dir = get_te_path() so_path = next(so_dir.glob(f"{module_name}..{extension}")) spec = importlib.util.spec_from_file_location(module_name, so_path) solib = importlib.util.module_from_spec(spec) sys.modules[module_name] = solib spec.loader.exec_module(solib)
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Wed_Jan_15_19:20:09_PST_2025
Cuda compilation tools, release 12.8, V12.8.61
Build cuda_12.8.r12.8/compiler.35404655_0
nvidia-smi
Sun Mar 2 00:32:30 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.10 Driver Version: 570.86.10 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:01:00.0 On | N/A |
| 0% 48C P8 24W / 170W | 592MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | 0 N/A N/A 1626 G /usr/lib/xorg/Xorg 269MiB | | 0 N/A N/A 1900 G /usr/bin/gnome-shell 118MiB | | 0 N/A N/A 5148 G ...ess --variations-seed-version 144MiB | +-----------------------------------------------------------------------------------------+
The error in the original issue indicates a problem with finding the cuDNN headers:
/workspace/jmwang/.local/lib/python3.12/site-packages/torch/include/ATen/cudnn/cudnn-wrapper.h:3:10: fatal error: cudnn.h: No such file or directory
@skr3178 could you confirm that you are seeing the same error (it should be above the lines you posted)?
@ptrendx I'm having the same issue as the OP with the cuDNN headers. Here's a reproducible example:
conda create --name tr_engine \
python=3.10 \
nvidia/label/cuda-12.6.3::cuda \
nvidia::cudnn
conda activate tr_engine
pip install torch --index-url https://download.pytorch.org/whl/cu126
export CUDA_HOME=$CONDA_PREFIX
export NVTE_FRAMEWORK=pytorch
pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable --verbose
This gives:
~/miniconda3/envs/tr_engine/bin/x86_64-conda-linux-gnu-c++ -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-_h4f8bra/transformer_engine/common/.. -I/tmp/pip-req-build-_h4f8bra/transformer_engine/common/include -I/tmp/pip-req-build-_h4f8bra/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-_h4f8bra/build/cmake/string_headers -isystem /lustre/fsw/portfolios/llmservice/users/cmccarthy/miniconda3/envs/tr_engine/targets/x86_64-linux/include -Wl,--version-script=/tmp/pip-req-build-_h4f8bra/transformer_engine/common/libtransformer_engine.version -O3 -DNDEBUG -std=gnu++17 -fPIC -MD -MT CMakeFiles/transformer_engine.dir/comm_gemm_overlap/comm_gemm_overlap.cpp.o -MF CMakeFiles/transformer_engine.dir/comm_gemm_overlap/comm_gemm_overlap.cpp.o.d -o CMakeFiles/transformer_engine.dir/comm_gemm_overlap/comm_gemm_overlap.cpp.o -c /tmp/pip-req-build-_h4f8bra/transformer_engine/common/comm_gemm_overlap/comm_gemm_overlap.cpp
In file included from /tmp/pip-req-build-_h4f8bra/transformer_engine/common/normalization/common.cpp:9:
/tmp/pip-req-build-_h4f8bra/transformer_engine/common/normalization/common.h:10:10: fatal error: cudnn.h: No such file or directory
10 | #include <cudnn.h>
| ^~~~~~~~~
In file included from /tmp/pip-req-build-_h4f8bra/transformer_engine/common/cudnn_utils.cpp:7:
/tmp/pip-req-build-_h4f8bra/transformer_engine/common/cudnn_utils.h:10:10: fatal error: cudnn.h: No such file or directory
10 | #include <cudnn.h>
| ^~~~~~~~~
compilation terminated.
compilation terminated.
But the file exists at the standard path (I'm assuming $CUDA_HOME/include is standard)
(tr_engine) ~/$ cd $CUDA_HOME && find . -name cudnn.h
./lib/python3.10/site-packages/nvidia/cudnn/include/cudnn.h
./include/cudnn.h
Explicitly adding export NVTE_CUDA_INCLUDE_PATH=$CUDA_HOME/include doesn't work either.
I've attached the full pip install / build output.
Thanks for taking a look.
At the moment, TE relies on its 3rd party module cudnn-frontend's cmake file to look for cuDNN. It takes a few HINTs but maybe those hints don't cover all the situations above. Could you please try adding CUDNN_PATH=<path to/cudnn/include> pip install ... to your installation command?
https://github.com/NVIDIA/cudnn-frontend/blob/5040925e9450c399a66240b485b38564226e1212/cmake/cuDNN.cmake#L5
@cyanguwa I appreciate the help but no luck...
NVTE_CUDA_INCLUDE_PATH
@collinmccarthy FYI, setting "CUDNN_PATH" worked for me (on slurm server, if you also happen to be using it), as mentioned in Compiling on Slurmcluster fatal error: cudnn.h: No such file or directory
export CUDNN_PATH=/path/to/cudnn
At the moment, TE relies on its 3rd party module cudnn-frontend's cmake file to look for cuDNN. It takes a few HINTs but maybe those hints don't cover all the situations above. Could you please try adding
CUDNN_PATH=<path to/cudnn/include> pip install ...to your installation command?https://github.com/NVIDIA/cudnn-frontend/blob/5040925e9450c399a66240b485b38564226e1212/cmake/cuDNN.cmake#L5
Hello, I'd like to ask why hint cudnn in CUDNN_PATH instead of CUDNN_PATH/include?
Is CUDNN_PATH probably xxx/envs/torch27py312/lib/python3.1/site-packages/nvidia/cudnn/ or xxx/envs/torch27py312/lib/python3.1/site-packages/nvidia/cudnn/include ?
maybe
export CPLUS_INCLUDE_PATH=/share/home/xxxxx/envs/torch27py312/lib/python3.1/site-packages/nvidia/cudnn/include will help you.
Thanks @wplf