Hadamard transform not working on SM120

Open TheTinyTeddy opened this issue 2 months ago • 1 comments

For SM120, with disable_rht=True being set in transformer_engine.common.recipe.NVFP4BlockScaling, the code works fine, but when disable_rht=False is set, the code below will result in cuda error

with te.fp8_autocast(enabled=True, fp8_recipe=fp4_recipe): out_fp4 = linear(x)

will produce the error: Running NVFP4 forward pass... Error: Failed to set Shared Memory size. torch.AcceleratorError: CUDA error: invalid argument

Is Hadamard transform only available for SM100?

Nov 12 '25 10:11 TheTinyTeddy

#2255

Nov 13 '25 22:11 alint77