Nesting configuration variable causes exception for TPE implementation
My apologies if the behavior described below was not intended to be supported. The Bergstra2013 paper in the section "Sharing a configuration variable across choice branches" seems to indicate such nesting is possible. If this was not intended to be supported, please close this ticket. Variable argument nesting seems to work correctly for the random search but causes an exception for TPE (after the first 20 bootstrapping evaluations).
I am trying to describe a parameter space in which the value of parameters further down (b) in the tree are dependent on the values higher in the tree (a).
Constraints: 0 <= a <= 50 0 <= b <= 50 b >= a
There is a minimum working example showing this problem below. This code works correctly if algo=rand.suggest, but the code fails if algo=tpe.suggest.
Question: Is this not intended to be supported, but happens to work for random search?
Also, if this is not intended to be supported, what is the best way to implement such a parameter search? One approach would be to implement the constraint checking in the loss function. For configurations which do not satisfy the constraints, it could just report status STATUS_FAIL. Of course, if large portions of the parameter space return status STATUS_FAIL, then the number of evaluations would need to be much greater to account for the large number of failures.
-Will
MWE:
import hyperopt
from hyperopt import hp
mya = hp.quniform('a',0,50,1.0)
space = {'a': mya}
space.update({'b':hp.quniform('b',mya,50,1.0),})
def loss(d):
val = 0
val += abs(d.get('a')-25)
val += abs(d.get('b')-30)
#print "computed loss: ", val, "config:",d
return {'loss':val,'status':hyperopt.STATUS_OK,'input':d}
trials = hyperopt.Trials()
print hyperopt.__version__
me = 10
r = hyperopt.fmin(loss,space=space,algo=hyperopt.tpe.suggest,max_evals=me,trials=trials)
print "after ",me,":", trials.average_best_error() ,r
me = 100
r = hyperopt.fmin(loss,space=space,algo=hyperopt.tpe.suggest,max_evals=me,trials=trials)
print "after ",me,":", trials.average_best_error() ,r
Program output:
##output from using rand.suggest
#0.0.3.dev
#after 10 : 4.0 {'1': 25.0, '2': 26.0}
#after 100 : 2.0 {'1': 26.0, '2': 29.0}
##output from tpe.suggest
#0.0.3.dev
#after 10 : 4.0 {'1': 25.0, '2': 26.0}
#Traceback (most recent call last):
# File "/home/groves/workspace/PythonUtils/HyperoptExperiments/DependentGraphExample/depe#ndent_tree_sample_bug_mwe.py", line 31, in <module>
r = hyperopt.fmin(loss,space=space,algo=hyperopt.tpe.suggest,max_evals=me,trials=trials)
File "/home/groves/progs/hyperopt/hyperopt/hyperopt/fmin.py", line 340, in fmin
verbose=verbose)
File "/home/groves/progs/hyperopt/hyperopt/hyperopt/base.py", line 588, in fmin
pass_expr_memo_ctrl=pass_expr_memo_ctrl)
File "/home/groves/progs/hyperopt/hyperopt/hyperopt/fmin.py", line 351, in fmin
rval.exhaust()
File "/home/groves/progs/hyperopt/hyperopt/hyperopt/fmin.py", line 302, in exhaust
self.run(self.max_evals - n_done, block_until_done=self.async)
File "/home/groves/progs/hyperopt/hyperopt/hyperopt/fmin.py", line 257, in run
new_trials = algo(new_ids, self.domain, trials)
File "/home/groves/progs/hyperopt/hyperopt/hyperopt/tpe.py", line 902, in suggest
print_node_on_error=False)
File "/home/groves/progs/hyperopt/hyperopt/hyperopt/pyll/base.py", line 860, in rec_eval
rval = scope._impls[node.name](*args, **kwargs)
File "/home/groves/progs/hyperopt/hyperopt/hyperopt/tpe.py", line 432, in adaptive_parzen_normal
srtd_mus[:prior_pos] = mus[order[:prior_pos]]
TypeError: only integer arrays with one element can be converted to an index
Thanks as always @willgroves for the report. This one touches on some subtleties.
Just to start from the beginning - you're right that from the perspective of random search, there is nothing theoretically problematic with introducing range dependencies between hyperparameters, and nothing problematic in practice either. hp_uniform('a', 0, hp_uniform('b', 0, 1)) will work as expected.
The TPE algorithm involves a step of graph analysis that determines which hyperparameters depend on which other ones in order to map the originally constructed search space onto a different data structure. TPE could in principle be written to deal with the sort of conditional distribution P(a|b) suggested here, but currently has no support for it. Nesting of choice nodes is handled, but this case is not. It would be a research project to design a strategy to support that.
Simpler short-term remedies could be:
- improve the error message (even I don't see immediately how/why this type error / indexing problem arises)
- suggest re-parameterizing your search space.
Often constraints can be rephrased e.g.
a = hp_uniform('a', 0, 1)
b = hp_uniform('b', 0, 1)
return {'a': a * b, 'b': b}
In other cases where re-parameterization is awkward, you can also (as you say) make the evaluation function check and reject illegal configurations... and the drawback (as you say) is that search efficiency can be compromised.
So... yeah. How much does that clear things up?
Often constraints can be rephrased e.g.
a = hp_uniform('a', 0, 1)
b = hp_uniform('b', 0, 1)
return {'a': a * b, 'b': b}
This kind of parameter rephrasing is I believe directly applicable to my problem. Essentially, My task is to sample only the upper or lower triangle in a 2-d matrix of possible configurations. To do the rephrasing, the parameters then become: a is the range of valid values in x, b is the range of valid values in y (for the given x value).
A rephrasing:
a = hp_uniform('a', 0, 1)
b = hp_uniform('b', 0, 1)
return {'x': a, 'y': b*(1-a)+a}
In other cases where re-parameterization is awkward, you can also (as you say) make the evaluation function check and reject illegal configurations... and the drawback (as you say) is that search efficiency can be compromised.
I have tried this as well on some toy experiments. It seems that for TPE, the gamma value must be adjusted downward. Perhaps this is to allow the gamma percentile to stay in the range of "interesting", non-failure values. The failure rate was about 75% for my space by doing the illegal configuration rejection in the evaluation function.
So... yeah. How much does that clear things up?
Yes, this was very helpful. The reparameterization approach looks promising for my current problem.
is it possible to add a filter / checker / transformator layer in between sampling from space and call the objective function?
So that parameters could be checked / transformed before input to objective function (or even dropped directly if not a valid parameter combination)
Without knowing much about your particulars, I would advise to include these sorts of things into the so-called objective function if possible.
Filtering out non-sense proposals, for example, can be done by returning a made-up but un-desirable objective function value. There are other techniques you can think of as well, but it's sometimes an ugly detail that needs some attention.
On Thu, Dec 22, 2016 at 7:42 PM, ZHUO Qiang [email protected] wrote:
is it possible to add a filter / checker / transformator layer in between sampling from space and call the objective function?
So that parameters could be checked / transformed before input to objective function (or even dropped directly if not a valid parameter combination)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hyperopt/hyperopt/issues/175#issuecomment-268921815, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKdDLG9JaI6bLkszSLQzETaWSRO02Vzks5rKxkPgaJpZM4BQNKF .
The problem of including filtering/validation function into the so-called objective function is that the max_evals still count for the invalid paramters.
Maybe we could provide a flag to fmin() to ignore counting of invalid run into max_evals (which objective function could return {'loss': None} in this case , for example)
Try using an explicit trials object. If you call fmin with a new trials object, it will run until max_evals have been tried. If you call fmin again with the same trials object and a slightly higher max_evals, it will pick up where it left off and evaluate a few more points.
If you repeat this sort of thing until the trials object has the right number of non-filtered evaluations, it might get the effect you're hoping for without having to modify hyperopt.
On Thu, Dec 22, 2016 at 9:25 PM, ZHUO Qiang [email protected] wrote:
The problem of including filtering/validation function into the so-called objective function is that the max_evals still count for the invalid paramters.
Maybe we could provide a flag to fmin() to ignore counting of invalid run into max_evals (which objective function could return {'loss': None} in this case , for example)
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hyperopt/hyperopt/issues/175#issuecomment-268930856, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKdDBjy-UfMKy6ZAlj2zdBBpIsZof0Hks5rKzEmgaJpZM4BQNKF .
Thanks, this trick helps.
Try using an explicit trials object. If you call fmin with a new trials object, it will run until max_evals have been tried. If you call fmin again with the same trials object and a slightly higher max_evals, it will pick up where it left off and evaluate a few more points. If you repeat this sort of thing until the trials object has the right number of non-filtered evaluations, it might get the effect you're hoping for without having to modify hyperopt.
Are there any updates on this issue?
I have a very similar issue using tpe.suggest and using conditional space. In my case, 'b' needs to always be at least 'a' + 1.
I would like something like: a = hp.uniform('a', 1, 5, 1) b = hp.uniform('b', a+1, 7, 1)
How could I re-parameterize this to work with tpe?
@Leonolovich This is discussed above. There are a few different approaches. You would have to change the range of b from 2 to 7. Then you would check the condition in your objective function and either return a failure status or an undesirable value of the objective function.
Hello All, I am new to Hyperopt so please show charity if this has been answered before. I am minimizing a loss function by adjusting some weights for variables in input data. Lets call weights to be w1, w2 and w3 and I want Hyperopt to look for a solution where 0 <= w1 <=100, 0 <= w2 <=100, 0 <= w3 <=100 and the w1+w2+w3 ==100. Is there an easy way to implement this using nested Hyperopt? Thanks