yghong
Results
3
comments of
yghong
Hi, I am also thinking about the memory issue, how did you deal with it?
Thanks for sharing!
Thanks for the reply! I did not add the `--no-gradient-accumulation-fusion` parameter and `get_args().gradient_accumulation_fusion` is `True` when running. However, W has such a short runtime, while B takes almost twice as...