yash3056 comments

Results 12 comments of


                                            yash3056

FP4 Training

according to this article it is stable https://developer.nvidia.com/blog/nvfp4-trains-with-precision-of-16-bit-and-speed-and-efficiency-of-4-bit/ it is worth it now?

FP4 Training

@kooshi to be exact they use mxfp4, and according to nvidia nvfp4 is more stable than mxfp4, so I think it is time for it to be added

[BUG] This error occurs when push_to_hub is true

> did you export HF_TOKEN and HF_USERNAME ? Yeah I did that The platform I am getting this error is on colab (I copied the notebook in the readme) and...

[BUG] This error occurs when push_to_hub is true

> in that case the token you are using doesnt have write access to create repos under the username you chose. I created token from write permission

[BUG] This error occurs when push_to_hub is true

I created token with write permission, Like this

DataParallel is Supported for XPU?

> @yash3056 > > The DP should be not fully supported by XPU for now. May I know why the DP is needed in your case, instead of DDP? I...

DataParallel is Supported for XPU?

@gujinghui @alexsin368 This code is running fine with pytorch 2.6 (mainline)

is NVFP4 not supported for rtx 50 series?

@ptrendx will disabling stochastic rounding and using round-to-nearest mode will not cause problem with model performance to degrade?

is NVFP4 not supported for rtx 50 series?

btw can you tell me how to compile for SM120a? I am trying to use this flag ```export CMAKE_CUDA_ARCHITECTURES="120a"``` but it is not building for sm_120a

is NVFP4 not supported for rtx 50 series?

I removed condition from https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/common/util/nvfp4_transpose.cuh#L201 and compiled it using 120a cm flag but I am getting this error ``` Error: Failed to set Shared Memory size. ``` most probable cause...