pytorch-optimizer GPU memory leak in adahessian optimizer?

Hi I am using your library and appreciate all the work you have put into this capability. I started using the adahessian optimizer and found that my GPU memory would increase until it used all my GPU memory and the run crashed as the optimizer operated. The leak seems to be within the get_trace routine and I believe it is can be fixed by changing

      hvs = torch.autograd.grad(
            grads, params, grad_outputs=v, only_inputs=True, retain_graph=True
        )

to

      hvs = torch.autograd.grad(
            grads, params, grad_outputs=v, only_inputs=True, retain_graph=False
        )

If you get a chance to check this out, please comment to let me know. Thanks!

Sep 08 '21 17:09 sjscotti

@sjscotti Would you like to submit PR with proposed fix? Could be counted as part of hactoberfest.

Oct 02 '21 15:10 jettify

Thanks for the suggestion, but I don't have experience in submitting pull requests. I did give it a try but I was stuck at the first step (comparing branches).
BTW, you might also add a @torch.no_grad() decorator before each routine in adahessian. I saw that done for some other implementations of adahessian (and there may be other optimizers in your library that could also use this decorator).

Oct 02 '21 16:10 sjscotti

Yep i have plan to add @torch.no_grad(), hopefully will find time do to this soon.

Oct 08 '21 13:10 jettify