Jianwei Feng comments

Results 9 comments of


                                            Jianwei Feng

Can't use run_segment with apex.amp

Hi, It could be that the pytorch checkpointing function is not supporting apex. Did you try torch.cuda.amp?

Can't use run_segment with apex.amp

Our implementation of auto parsing graph is depending on torch.jit and quite volatile with pytorch version. If you have manual parse_graph function it can definitely work with 1.6. For auto...

How to auto parsing?

Hi, Autoparse should support all the networks in benchmark.py. Can you remove the "try" at https://github.com/lordfjw/OptimalGradCheckpointing/blob/main/benchmark.py#L167 and paste the error message? Can you share the pytorch version? It might also...

Validation and testing results are not good

Same here. Trained with person_1's data and config. Training psnr gradually reaches 32 while validation psnr plateaus at 20. Would really appreciate if author can help here!

Problem with PyTorch Version 1.10

Hi, Thanks for reporting the issue. The peak memory in pytorch 1.10 seems quite different with 1.5. But the intermidiate tensors memory look similar. I can see there is a...

Thank you & two minor issues

Hi IsabelFunke, Thank you for your comment! I will double check the logic in graph.py when I get some time. I will also restructure and modularize the code better. As...

How do I get the checkpoints when applying it at linear chain feedforward models?

Hi sorry for the late reply. You can get the checkpointed tensors at https://github.com/jianweif/OptimalGradCheckpointing/blob/main/graph.py#L876 Instead of returning `tensor_dict[target]` which is the output tensor, you can return the entire `tensor_dict` where...

[Bug?] Gradient Synchronization for DDP

Code reference of forward no sync with ddp: https://github.com/pytorch/PiPPy/blob/main/pippy/_PipelineStage.py#L425 And backward no sync with ddp: https://github.com/pytorch/PiPPy/blob/main/pippy/_PipelineStage.py#L444 Since this is separate call, the no_sync will not take effect and still trigger...

[Bug?] Gradient Synchronization for DDP

Thanks for your quick response! I don't have exact root cause, but see other users also reported calling fwd and bwd separately in no_sync context still triggers grad_sync. https://discuss.pytorch.org/t/whats-no-sync-exactly-do-in-ddp/170259. I...