workerpoolmanager
workerpoolmanager copied to clipboard
Starting multiple tasks where one fails to launch results in tasks running with no manager
Starting multiple tasks where one fails to launch results in zombies. If there are tasks A, B and C. Task C has incorrect permissions. When starting the wp manager tasks A and B are started correctly, task C fails to start. The wp manager then exists straight away and leaves all instances of A and B running.
As a consequence when restarting the wp manager there are messages about receiving keep-alives for workers that are not registered.
So I think the options are these:
- Workers die when they don't have a manager (not sure this is feasible)
- Don't start workers until we are more sure all tasks will start
- Kill old tasks if they are running, on wp manager restart
- Adopt old tasks and allow the manager to re-balance to get the correct cardinality