Pre-built wheels
Describe the bug
Hi,
Are there any plans to publish prebuilt wheels? Right now during pip install, the pybind modules are being built via CMake in a brittle manner (accessing nvidia.__file__ or expecting cuda toolkit to be installed in system path (non-hermetic behaviour).
Steps/Code to reproduce bug
pip install -vvv transformer_engine[jax]
Invokes the compiler to build c++ files.
Expected behavior
No building during pip install. Wheels should be released for all supported platforms/cuda/python versions.
Environment overview (please complete the following information)
- Environment location: Local Ubuntu machine
- Method of Transformer Engine install: [pip install]
Environment details
- Python version = 3.12
- Transformer Engine version = 2.8.0
This behavior is intentional, not a bug. We provide pre-built wheels only for the core library, since it accounts for the majority of the compilation time. The framework integrations (such as JAX) are distributed as source packages (sdists) and compiled locally during installation. This approach is necessary because JAX does not maintain a stable ABI, even across minor or patch releases. Consequently, pre-built wheels for the JAX bindings would only function correctly if the user’s installed JAX version matched almost exactly the one used to build the wheel. That requirement would make distribution overly restrictive and impractical, so shipping source code is the more reasonable approach.
would it be possible to provide pre-built pytorch wheels? Still seeing ~10-15 minute build time for working with transformer-engine-torch (installing via uv), and pre-built wheels would be very helpful.
Otherwise, any pointers for building transfomer-engine-torch more quickly using uv?
@erictang000 10-15 minutes seems excessive - could you post a log of that installation?
It's more like ~8 minutes actually, which is still quite long - logs here: https://gist.github.com/erictang000/b03880e22ca2fe1d35db03fd6ef0ba93
this is with running the following commands (which we are using in SkyRL to use megatron): https://github.com/NovaSky-AI/SkyRL/blob/bf5a655594aaa2e137d9401d9eab3e2e2a33228a/skyrl-train/pyproject.toml#L122
Another reason that wheels would be helpful is that when we are running multi-node, we would like to ship transformer-engine with uv, but when installing with no build isolation, we need to make sure that the relevant environment variables are set on each node, which is pretty costly to do manually. Since we prefer to use uv rather than pip, it's also trickier to bake these dependencies/env vars into a docker image.
Something like what we can do for flash-attn and flash-infer would be super helpful for us (we just specify the source wheel in our pyproject.toml here): https://github.com/NovaSky-AI/SkyRL/blob/bf5a655594aaa2e137d9401d9eab3e2e2a33228a/skyrl-train/pyproject.toml#L73)
Even if you want to build from scratch there are QoL features that could improve the build experience such as: https://github.com/NVIDIA/TransformerEngine/issues/2331 . Otherwise, ti's possible to have some libraries linked to system libraries, and some linked to pypi so files and this can cause pretty awful conflicts, particularly if PyTorch/JAX differ in their dependencies / runtime loading ABI features.