Gradients for Acquisition are wrong when `normalizer=True`
Hi,
It appears that the gradients of the acquisition functions are wrong when normalizer=True is used in the model definition. This is because model.predictive_gradients in GPy (which is called by model.get_prediction_gradients in Emukit) does not account for normalization. I raised this issue here and made a pull request to fix it.
I don't think Emukit enforces or recommends to use normalizer=False anywhere. This is problematic because it is up to the user to define their own model "upstream" of the optimization loop. I suspect that many people are tempted to use normalizer=True without knowing that the gradients of their acquisition function will be wrong.
If the pull request I made is accepted, then there is nothing to do except to tell people that they should use the latest (devel?) version of GPy.
If I am missing anything, please let me know.
Thanks, Antoine
import numpy as np
import GPy
from GPy.models import GradientChecker
from emukit.model_wrappers.gpy_model_wrappers import GPyModelWrapper
from emukit.bayesian_optimization.acquisitions import ProbabilityOfImprovement
M, Q = 15, 3
X = np.random.rand(M,Q)
Y = np.random.rand(M,1)
x = np.random.rand(1, Q)
model = GPy.models.GPRegression(X=X, Y=Y, normalizer=True)
emukit_model = GPyModelWrapper(model)
acq = ProbabilityOfImprovement(emukit_model)
g = GradientChecker(lambda x: acq.evaluate_with_gradients(x)[0],
lambda x: acq.evaluate_with_gradients(x)[1],
x, 'x')
assert(g.checkgrad())
Thanks Antoine, that's good thing to be aware of. Let's keep this issue open so that people can find it. Once GPy folks merge in your work, we can bump GPy version in our reqs
Thanks very much, Andrei. The same problem arises in GPyOpt, so I'll raise an issue there as well so that people know.
As a side note, I think it might be useful to investigate whether optimization should be performed in the normalized space or in the original space. I would think that the former is preferable for numerical stability, and also because it becomes easier for the user to tune the parameters appearing in the acquisition function, if any (e.g., jitter in EI/PI or beta in UCB)? If that's the path you want to take, I'll be happy to make a pull request.
Hi,
After reading this issue, I would have some questions.
If I am not wrong, the normalizer aforementioned is set in emukit with the parameter normalize_Y and is by default True for the GPBayesianOptimization (in examples/gp_bayesian_optimization/single_objective_optimization.py).
However, this parameter does not seem to be used afterwards in emukit. Shouldn't it be included in the method _model_choser(self) in the GPRegression ? Whether this normalize_Y is set to True or False, it will change nothing for the GPBayesianOptimization, and the normalizer will be by default the one in GPy, which is None, right ?
Regards, Anthony Larroque
Thanks, Anthony. Here are my two cents on the issue:
-
I think you're right that
GPBayesianOptimizationessentially ignoresnormalize_Yand uses the default value set in GPy. I believe thatGPBayesianOptimizationwas intended as a high-level wrapper that requires minimal inputs from the user. I agree thatnormalize_Yshould be included in_model_choser(self). (Or it should be removed from the kwargs inGPBayesianOptimization.) -
I tend to prefer
BayesianOptimizationbecause it offers more flexibility in terms of model definition.BayesianOptimizationuses whatever model is passed to it, and does not override the value ofnormalize_Y(I think). -
The normalization problem also appears in
ExperimentalDesignLoop, which also requires the user to provide their ownmodel. I believe that the Quadrature loops operate the same way. I haven't checked the Multi-fidelity or Sensitivity analysis models. -
Many of the examples in the Tutorials use a step-by-step approach to show the basic concept of each method. This includes custom ("upstream") model definition and acquisition plots. I suspect that anybody who wants to do a bit of experimentation will stumble on the gradient issue.
-
I mentioned in an earlier post that GPyOpt suffers from the same problem. After some more digging, it appears that that is not the case. From what I can tell, the models in GPyOpt only "see" normalized data (for instance, see
_update_model), so that the snippet in the Original Post won't raise any error, regardless of the value ofnormalizer.
Please feel free to correct if I missed anything. Thanks.
Hi Antoine,
Thank you very much for your answers !
To answer some of your comments:
-
Thank for the advice, you are right. I am currently working with GP so I was using GPBayesianOptimization and modified this class a little bit to include more features about the GaussianProcess or the acquisition functions. But if I am planning to use more complex models, I will probably use BayesianOptimization. I also might do a pull request if the emukit developers are interested for the GPBayesianOptimization ? These new features include the possibility to change the jitter, beta, batch_size, or the kernel of the GP for examples.
-
For the sensitivity and multi-fidelity, I am not sure. The user is also required to provide the model but I do not know how the predictive_gradients are really used in that cases.
-
I did not see the option normalizer=True set in any tutorials so as long as people are following the tutorials and do not touch the normalizer, I think they should be fine if the normalizer is set by default to None in their GPy versions. But yes, that may be a problem if they change it.
-
I agree with you. The models in GPyOpt seem to normalize directly the Y if the normalize_Y is set to True without setting the normalizer for the GPy model.
Regards, Anthony Larroque
Thanks, Anthony. That makes sense. I'm glad we're on the same page 👍
FYI: #806 was merged to GPy's devel.
Hi Antoine,
Thank you for advising us and for the fix !
Now realizing that since the original PR in GPy is merged, we can safely close this issue. Thanks @ablancha for opening it!