yash3056

Results 12 comments of yash3056

according to this article it is stable https://developer.nvidia.com/blog/nvfp4-trains-with-precision-of-16-bit-and-speed-and-efficiency-of-4-bit/ it is worth it now?

@kooshi to be exact they use mxfp4, and according to nvidia nvfp4 is more stable than mxfp4, so I think it is time for it to be added

> did you export HF_TOKEN and HF_USERNAME ? Yeah I did that The platform I am getting this error is on colab (I copied the notebook in the readme) and...

> in that case the token you are using doesnt have write access to create repos under the username you chose. I created token from write permission

I created token with write permission, Like this

> @yash3056 > > The DP should be not fully supported by XPU for now. May I know why the DP is needed in your case, instead of DDP? I...

@gujinghui @alexsin368 This code is running fine with pytorch 2.6 (mainline)

@ptrendx will disabling stochastic rounding and using round-to-nearest mode will not cause problem with model performance to degrade?

btw can you tell me how to compile for SM120a? I am trying to use this flag ```export CMAKE_CUDA_ARCHITECTURES="120a"``` but it is not building for sm_120a

I removed condition from https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/common/util/nvfp4_transpose.cuh#L201 and compiled it using 120a cm flag but I am getting this error ``` Error: Failed to set Shared Memory size. ``` most probable cause...