TransformerEngine Pre-built wheels

Describe the bug

Hi,

Are there any plans to publish prebuilt wheels? Right now during pip install, the pybind modules are being built via CMake in a brittle manner (accessing nvidia.__file__ or expecting cuda toolkit to be installed in system path (non-hermetic behaviour).

Steps/Code to reproduce bug

pip install -vvv transformer_engine[jax]

Invokes the compiler to build c++ files.

Expected behavior

No building during pip install. Wheels should be released for all supported platforms/cuda/python versions.

Environment overview (please complete the following information)

Environment location: Local Ubuntu machine
Method of Transformer Engine install: [pip install]

Environment details

Python version = 3.12
Transformer Engine version = 2.8.0

Oct 08 '25 22:10 bhatuzdaname

This behavior is intentional, not a bug. We provide pre-built wheels only for the core library, since it accounts for the majority of the compilation time. The framework integrations (such as JAX) are distributed as source packages (sdists) and compiled locally during installation. This approach is necessary because JAX does not maintain a stable ABI, even across minor or patch releases. Consequently, pre-built wheels for the JAX bindings would only function correctly if the user’s installed JAX version matched almost exactly the one used to build the wheel. That requirement would make distribution overly restrictive and impractical, so shipping source code is the more reasonable approach.

Oct 08 '25 23:10 ksivaman

would it be possible to provide pre-built pytorch wheels? Still seeing ~10-15 minute build time for working with transformer-engine-torch (installing via uv), and pre-built wheels would be very helpful.

Otherwise, any pointers for building transfomer-engine-torch more quickly using uv?

Oct 15 '25 18:10 erictang000

@erictang000 10-15 minutes seems excessive - could you post a log of that installation?

Oct 16 '25 20:10 ptrendx

It's more like ~8 minutes actually, which is still quite long - logs here: https://gist.github.com/erictang000/b03880e22ca2fe1d35db03fd6ef0ba93

this is with running the following commands (which we are using in SkyRL to use megatron): https://github.com/NovaSky-AI/SkyRL/blob/bf5a655594aaa2e137d9401d9eab3e2e2a33228a/skyrl-train/pyproject.toml#L122

Another reason that wheels would be helpful is that when we are running multi-node, we would like to ship transformer-engine with uv, but when installing with no build isolation, we need to make sure that the relevant environment variables are set on each node, which is pretty costly to do manually. Since we prefer to use uv rather than pip, it's also trickier to bake these dependencies/env vars into a docker image.

Something like what we can do for flash-attn and flash-infer would be super helpful for us (we just specify the source wheel in our pyproject.toml here): https://github.com/NovaSky-AI/SkyRL/blob/bf5a655594aaa2e137d9401d9eab3e2e2a33228a/skyrl-train/pyproject.toml#L73)

Oct 16 '25 21:10 erictang000

Even if you want to build from scratch there are QoL features that could improve the build experience such as: https://github.com/NVIDIA/TransformerEngine/issues/2331 . Otherwise, ti's possible to have some libraries linked to system libraries, and some linked to pypi so files and this can cause pretty awful conflicts, particularly if PyTorch/JAX differ in their dependencies / runtime loading ABI features.

Nov 01 '25 23:11 Skylion007