yghong

Results 3 comments of yghong

Hi, I am also thinking about the memory issue, how did you deal with it?

Thanks for the reply! I did not add the `--no-gradient-accumulation-fusion` parameter and `get_args().gradient_accumulation_fusion` is `True` when running. However, W has such a short runtime, while B takes almost twice as...