Jianwei Feng

Results 9 comments of Jianwei Feng

Hi, It could be that the pytorch checkpointing function is not supporting apex. Did you try torch.cuda.amp?

Our implementation of auto parsing graph is depending on torch.jit and quite volatile with pytorch version. If you have manual parse_graph function it can definitely work with 1.6. For auto...

Hi, Autoparse should support all the networks in benchmark.py. Can you remove the "try" at https://github.com/lordfjw/OptimalGradCheckpointing/blob/main/benchmark.py#L167 and paste the error message? Can you share the pytorch version? It might also...

Same here. Trained with person_1's data and config. Training psnr gradually reaches 32 while validation psnr plateaus at 20. Would really appreciate if author can help here!

Hi, Thanks for reporting the issue. The peak memory in pytorch 1.10 seems quite different with 1.5. But the intermidiate tensors memory look similar. I can see there is a...

Hi IsabelFunke, Thank you for your comment! I will double check the logic in graph.py when I get some time. I will also restructure and modularize the code better. As...

Hi sorry for the late reply. You can get the checkpointed tensors at https://github.com/jianweif/OptimalGradCheckpointing/blob/main/graph.py#L876 Instead of returning `tensor_dict[target]` which is the output tensor, you can return the entire `tensor_dict` where...

Code reference of forward no sync with ddp: https://github.com/pytorch/PiPPy/blob/main/pippy/_PipelineStage.py#L425 And backward no sync with ddp: https://github.com/pytorch/PiPPy/blob/main/pippy/_PipelineStage.py#L444 Since this is separate call, the no_sync will not take effect and still trigger...

Thanks for your quick response! I don't have exact root cause, but see other users also reported calling fwd and bwd separately in no_sync context still triggers grad_sync. https://discuss.pytorch.org/t/whats-no-sync-exactly-do-in-ddp/170259. I...