TransformerEngine
TransformerEngine copied to clipboard
fp8blockscaled training not converging on sm120
I have tried nvfp4 training which converges on sm120, but fp8blockscaled recipe won't converge for any of its available options. Is it because of power of 2 scale (cannot be set False for sm120 arch) that causes the problem?