Cook icon indicating copy to clipboard operation
Cook copied to clipboard

Scheduler silently fails on malformed ZK URLs

Open PerilousApricot opened this issue 7 years ago • 0 comments

Describe the bug In a couple places [1] [2], the user is instructed to postfix the ZK connection string with a directory (zk node?) /cook. If the user does this, the scheduler for some reason will never connect to the mesos master.

[1] https://github.com/twosigma/Cook/blob/master/scheduler/docs/configuration.adoc [2] https://github.com/twosigma/Cook/blob/master/scheduler/example-prod-config.edn#L15

To Reproduce Download the latest Cook, build, and manually set the :zookeeper {: connection} config option to have a trailing /cook. The scheduler will begin some preparatory work, then seemingly hang, just periodically writing heartbeat messages to the log. I can turn this failure mode on and off by adding/removing that suffix.

Expected behavior I'd expect an explicit crash in this case. I presume that the scheduler can't attempts to perform master election and fails because of the invalid ZK hostname. Since I never saw an error, and one of the final lines in the log is from Cook trying to find the mesos scheduler, I tried debugging that interaction, when the true failure was elsewhere.

PerilousApricot avatar Sep 04 '18 16:09 PerilousApricot