Prevent state from writing to ZK too often
- [ ] Only write to ZK if it needs to change.
- [ ] Reduce the amount of traffic with ZK.
Unless something is failing due to too heavy writing, I would consider this as an improvement?
Yes fair point. It was originaly thought that this was causing crashes, but it turned out to be an out of disk error.
It's very important to limit the frequency of writes to ZK.
The current healthcheck mechanism uses Mesos's timestamp on each Status update, which is written to Zookeeper. If you don't write to zookeeper on every status update, then the status of the executor will be out of date. When you say "limit the frequency", what sort of numbers are you talking about? The healthchecks will update the status at a predefined rate (change via settings) which defaults to 30 seconds. So if you have 10 executors, you will get 10 writes every 30 seconds. Is this too high? If so, why? And if the reason is that zookeeper can't handle it, maybe we should look at another method.