Best Practices for Custom Gradients
I've been trying to define custom egrad and ehess, however, I can't seem to find the right way to do it. Here are the things I tried:
-
https://github.com/pymanopt/pymanopt/issues/38#issuecomment-319102003: Tried doing this. Didn't work. I got an error here: https://github.com/pymanopt/pymanopt/blob/df94ab9e03b5fa3041668defe995d93b8715a6d7/pymanopt/core/problem.py#L105-L109
-
Upon decorating the function with
pymanopt.function.PyTorchI got an error here: https://github.com/pymanopt/pymanopt/blob/df94ab9e03b5fa3041668defe995d93b8715a6d7/pymanopt/core/problem.py#L138-L140 saying the return type is "not expected" as it only accepts in alistor atuple. -
Hence, I tried returning a list of Pytorch Tensors instead. which raised an error here: https://github.com/pymanopt/pymanopt/blob/df94ab9e03b5fa3041668defe995d93b8715a6d7/pymanopt/autodiff/backends/_pytorch.py#L47 saying it cannot find a property
.numpy()for a list as one would expect. -
Next I made a
pymanopt.autodiff.Functionwith the backend as_CallableBackendfor both thecustom_ehessandcustom_egradfunctions. Here I returned a list of numpy arrays from thecustom_egradandcustom_ehessfunctions. The code finally ran, however, neither thecost, nor the|grad|was changing with theTrustRegionssolver for all the steps. I ran multiple different initializations wondering maybe it has to do bad initializations, however, it stayed this way in all of these experiments. -
Finally, I monkey-patched the
_ehessand_egradparameters ofpymanopt.Problemas follows:
problem = pymanopt.Problem(manifold=..., cost=...)
problem._egrad = custom_egrad
problem._ehess = custom_ehess
this time, I was a list of numpy arrays from custom_egrad and custom_ehess.
Result: Like in 4. the cost and the |grad| stayed the exact same for all the steps in all the 10+ experiments with different initializations.
I reconfirmed the initlalizations is not working, by comparing the frobenius norm of the difference between initialization and result. It has a negligible change (~ 1e-5)
The problem is defined correctly, since it converges to desired results if I don't use custom egrad and ehess. However, I need to do customization for parallelization purposes. As an extra step, I removed all parallelization code, to see if the value changes with custom gradient and hessian. They don't.
So, clearly, I'm doing something wrong here. What's the correct way to define custom gradients and hvps?
Thanks.
UPDATE:
6. Tried using the Callable backend like shown here: https://github.com/pymanopt/pymanopt/blob/df94ab9e03b5fa3041668defe995d93b8715a6d7/examples/advanced/check_gradient.py#L26-L32 The code runs, but, much like 4. and 5., the cost value and |grad| is not updating between steps.
CC: @j-towns @NicolasBoumal
Thanks for the detailed description of your attempts so far. The following example shows one way of defining egrad directly: https://github.com/pymanopt/pymanopt/blob/master/examples/dominant_invariant_subspace.py If that does help, could you post a minimal working code where this issue comes up; something we could run on our end?
Hi @NicolasBoumal, after some more debugging, I found that setting use_rand = False for TrustRegions works with the custom differentials. Apparently, it was not updating still since the solver was rejecting all the updates. It is however very confusing, why use_rand=True works with default differentials, but not with custom differentials.
Indeed, that is something to look into. Thanks for reporting it!
Sorry for getting back to you on this just now. Overwriting internal properties should really not be the way to provide custom gradient/Hessian maps. Decorating a callable with the pymanopt.function.numpy decorator is all that should be necessary. If you could provide a minimal working example, we could look into this a little further :pray: