TransformerEngine
TransformerEngine copied to clipboard
Hadamard transform not working on SM120
For SM120, with disable_rht=True being set in transformer_engine.common.recipe.NVFP4BlockScaling, the code works fine, but when disable_rht=False is set, the code below will result in cuda error
with te.fp8_autocast(enabled=True, fp8_recipe=fp4_recipe): out_fp4 = linear(x)
will produce the error: Running NVFP4 forward pass... Error: Failed to set Shared Memory size. torch.AcceleratorError: CUDA error: invalid argument
Is Hadamard transform only available for SM100?
#2255