hyperopt icon indicating copy to clipboard operation
hyperopt copied to clipboard

hyperopt-mongo-worker should reset the current trial again when killed

Open ExpectationMax opened this issue 9 years ago • 0 comments

Currently, when a worker is killed (which sometimes happens on a cluster), the state of the trial remains at 1 indicating that the job is still running.

This prevents the continuation of an experiment, as none of the workers feels responsible to execute the trial with state=1 and an corresponding owner.

Either, the worker should set the state of the trial to 0 (new) or 3 (failed) when on external signals (I would prefer the first, as the trial did not fail due to intrinsic reasons) or there should be a script that allows the reset of trials with status 1 (maybe also an error message if fmin is executed on a database that already has running jobs).

Currently a simple workaround is to execute the following commands in the mongo shell, where dbname is the selected database (default: hyperopt).

use dbname
db.jobs.update({ state: 1}, {$unset: {owner: ""}, $set: {state: 0}}, {multi: true})

What would be a more convenient way to handle this?

ExpectationMax avatar Sep 27 '16 12:09 ExpectationMax