Error in NVRTC compilation
Hi! I'm getting the following error when trying to use Transformer Engine: 100 errors detected in the compilation of "transformer_engine/common/transpose/rtc/cast_transpose.cu". Compilation terminated.
RuntimeError: /home/user/pip-req-build-n0h8f0x1/transformer_engine/common/util/rtc.cpp:200 in function compile: NVRTC Error: NVRTC_ERROR_COMPILATION
The library was working prior, but I keep running into this now. Any idea why this is happening? I did a fresh install of the library, and am still running into it. This is only occurring when I use the following FP8 recipe and try to call the context manager: fp8_recipe = recipe.DelayedScaling( fp8_format=recipe.Format.HYBRID, amax_history_len=16, amax_compute_algo="max" )
That's definitely a problem. Can you post the full error message? It should be dumped to the stderr output: https://github.com/NVIDIA/TransformerEngine/blob/8c0a0c93444eeb8b6a3702d0b0ef149d3889bc4f/transformer_engine/common/util/rtc.cpp#L177-L188
Hi Tim, Thanks for reaching out. Here is the full error message: NVRTC compilation log for transformer_engine/common/transpose/rtc/cast_transpose.cu: /scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(383): error: identifier "NV_IS_DEVICE" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(383): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(383): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(383): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(383): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(390): warning #12-D: parsing restarts here after previous syntax error
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(396): error: identifier "NV_IS_DEVICE" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(396): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(396): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(396): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(396): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(403): warning #12-D: parsing restarts here after previous syntax error
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(424): error: identifier "NV_PROVIDES_SM_90" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(425): error: type name is not allowed
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(425): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(424): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(426): error: identifier "val" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(426): error: expression must be a modifiable lvalue
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(426): error: an asm operand must have scalar type
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(428): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(430): error: identifier "f" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(449): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(450): error: expected a ";"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(455): error: identifier "NV_PROVIDES_SM_80" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(456): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(456): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(455): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(457): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(461): error: identifier "r" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(466): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(472): error: identifier "NV_PROVIDES_SM_80" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(473): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(473): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(472): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(474): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(478): error: identifier "r" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(483): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(489): error: identifier "NV_PROVIDES_SM_80" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(490): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(490): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(489): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(491): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(495): error: identifier "r" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(497): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(502): error: identifier "NV_PROVIDES_SM_90" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(503): error: type name is not allowed
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(503): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(502): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(504): error: identifier "val" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(504): error: expression must be a modifiable lvalue
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(504): error: an asm operand must have scalar type
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(506): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(517): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(518): error: expected a ";"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(521): error: identifier "NV_PROVIDES_SM_90" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(522): error: type name is not allowed
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(522): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(521): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(523): error: identifier "val" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(523): error: expression must be a modifiable lvalue
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(523): error: an asm operand must have scalar type
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(525): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(536): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(537): error: expected a ";"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(541): error: identifier "NV_PROVIDES_SM_80" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(542): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(544): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(541): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(545): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(547): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(540): warning #177-D: variable "val" was declared but never referenced
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(553): error: identifier "NV_PROVIDES_SM_80" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(554): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(555): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(553): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(556): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(558): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(552): warning #177-D: variable "val" was declared but never referenced
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(566): error: identifier "NV_PROVIDES_SM_90" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(567): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(567): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(566): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(568): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(570): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(565): warning #177-D: variable "f" was declared but never referenced
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(578): error: identifier "NV_IS_DEVICE" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(579): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(578): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(580): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(582): error: identifier "u" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(583): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(627): error: identifier "NV_PROVIDES_SM_90" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(628): error: type name is not allowed
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(628): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(627): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(629): error: identifier "val" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(629): error: expression must be a modifiable lvalue
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(629): error: an asm operand must have scalar type
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(631): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(633): error: expected an expression
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(634): error: expected a ";"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(640): error: identifier "NV_IS_DEVICE" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(641): error: expected a ")"
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(640): error: identifier "NV_IF_ELSE_TARGET" is undefined
/scratch/midway3/tvallabh/tarun_dev/include/cuda_bf16.hpp(642): error: expected an expression
Error limit reached. 100 errors detected in the compilation of "transformer_engine/common/transpose/rtc/cast_transpose.cu". Compilation terminated.
Interesting. NVRTC requires run-time access to the CUDA headers to perform JIT compilation, and TE is using the ones at /scratch/midway3/tvallabh/tarun_dev. I have a couple thoughts:
- The quickest fix is to set
NVTE_DISABLE_NVRTC=1in the environment. However, this isn't great since it will fall back to unoptimized kernels. - Did you build TE with the CUDA install at
/scratch/midway3/tvallabh/tarun_dev? These error messages are consistent with a mismatch between the build-time and run-time CUDA versions. To control this explicitly with environment variables, setCUDA_PATH(used in build process) andNVTE_CUDA_INCLUDE_DIR(used for NVRTC). - NVRTC should implicitly be defining the missing
NV_macros. The macros are defined in thenv/targetheader, which is explicitly excluded during NVRTC compilation (seecuda_bf16.h). It's puzzling why this error is even happening.
I've confirmed that these variables are correctly set in my script and are available during runtime. However, the NVRTC compilation error persists. I've also verified that the CUDA installation at /scratch/midway3/tvallabh/tarun_dev is complete and includes the necessary NVRTC headers (nvrtc.h is present in the include directory). Despite setting these environment variables, I'm still seeing the same error messages. Any other checks I can perform, or to resolve this issue? Is it possible that Transformer Engine needs to be rebuilt with these paths explicitly set?
I think this is a version mismatch between the CUDA runtime (i.e. from libcudart.so) and the CUDA in /scratch/midway3/tvallabh/tarun_dev. The error message line numbers for cuda_bf16.hpp are consistent with CUDA 12.0 or CUDA 12.1. However, starting in CUDA 12.3 there is logic to exclude cuda_bf16.hpp from NVRTC compilation. Thus, it looks like you are linking TE to CUDA 12.3+ while /scratch/midway3/tvallabh/tarun_dev contains CUDA 12.0 or 12.1. To fix this error, make sure you are using a consistent CUDA version. I'd figure out what CUDA version your linker is loading (probably in /usr/local/cuda) and set NVTE_CUDA_INCLUDE_DIR accordingly. If you are set on using /scratch/midway3/tvallabh/tarun_dev, then you'll have to rebuild or do linker shenanigans (e.g. mess with LD_LIBRARY_PATH).
This is quite a subtle bug. I think TE would benefit from some version checking logic to catch these kinds of mismatches.