OptimalGradCheckpointing issues

How do I get the checkpoints when applying it at linear chain feedforward models?

1

As title. The program shows the peak memory usage and cut-offs, while I need help/hint as the title.

Problem with PyTorch Version 1.10

3

Hi, I am trying to reproduce the results. It works correctly with PyTorch 1.5, but with PyTorch 1.10 - `Parsing Computation Graph with torch.jit failed` and with manual parse_graph function...

karinaodm

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

``` ... File "/home/Foo/miniforge3/envs/pytor/lib/python3.6/site-packages/torch/nn/functional.py", line 1923, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED ``` Got this error when trying to run the following command the README.md: ```...

nyngwang

Can't use run_segment with apex.amp

4

I use code like this ``` run_segment = optimal_grad_checkpointing(model, inp) run_segment, optimizer = apex.amp.initialize(run_segment, optimizer, opt_level="02", verbosity=0) ... output = run_segment(images) ``` and get the error ``` output = run_segment(images)...

karinaodm

How to auto parsing?

1

I run the benchmark.pu with the following warnings. python benchmark.py --arch resnet18 --device cuda:0 Parsing Computation Graph with torch.jit failed, revert to manual parse_graph function

feifeibear

Thank you & two minor issues

1

Hi, thank you so much for providing this code! Using the automatic computation graph parser, I was able to use the optimal gradient checkpoints during model training without writing much...

IsabelFunke

Making it work with deepspeed

I know it's been a while since this repo was uploaded but was thinking of getting this to work with deepspeed. Did you by any chance try that? Internally deepspeed...

eliird

OptimalGradCheckpointing
OptimalGradCheckpointing copied to clipboard

Metadata

How do I get the checkpoints when applying it at linear chain feedforward models?

Problem with PyTorch Version 1.10

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Can't use run_segment with apex.amp

How to auto parsing?

Thank you & two minor issues

Making it work with deepspeed

← Metadata

Owner

Metadata

OptimalGradCheckpointing OptimalGradCheckpointing copied to clipboard

Metadata

How do I get the checkpoints when applying it at linear chain feedforward models?

Problem with PyTorch Version 1.10

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Can't use run_segment with apex.amp

How to auto parsing?

Thank you & two minor issues

Making it work with deepspeed

← Metadata

Owner

Metadata

OptimalGradCheckpointing
OptimalGradCheckpointing copied to clipboard