Make MaxReplacement configurable
How to categorize this issue?
/area quality /kind enhancement /priority 3
What would you like to be added:
Machine controller replaces 1 machine per machineDeployment in case of healthTimeout currently as per this PR. But this value should be configurable by user on shoot.Yaml where user can provide maxReplacementsPerZone in worker pool settings.
Why is this needed: For better user control
@himanshu-kun Label area/productivity does not exist.
cc @unmarshall
a PR which tried to introduce throttling in Failed machine deletion is https://github.com/gardener/machine-controller-manager/pull/482
It has some ideas regarding making maxReplacement configurable
Grooming
- We currently transition un-healthy machines in
unknownphase tofailedphase by MaxReplacement=1 in every reconcile cycle. - We propose making the MaxReplacements configurable parameter for the machine controller.
- This can be a static or percentage based value. (semantics to be designed later)
- We should relook and re-factor the current
PermitGiverlogic andcanMarkMachineFailedlogic which does excessive work in the reconcile health check cycle and obviates code clarity.