machine-controller-manager icon indicating copy to clipboard operation
machine-controller-manager copied to clipboard

Make MaxReplacement configurable

Open himanshu-kun opened this issue 3 years ago • 4 comments

How to categorize this issue?

/area quality /kind enhancement /priority 3

What would you like to be added: Machine controller replaces 1 machine per machineDeployment in case of healthTimeout currently as per this PR. But this value should be configurable by user on shoot.Yaml where user can provide maxReplacementsPerZone in worker pool settings.

Why is this needed: For better user control

himanshu-kun avatar Mar 08 '22 04:03 himanshu-kun

@himanshu-kun Label area/productivity does not exist.

gardener-robot avatar Mar 08 '22 04:03 gardener-robot

cc @unmarshall

himanshu-kun avatar Mar 08 '22 04:03 himanshu-kun

a PR which tried to introduce throttling in Failed machine deletion is https://github.com/gardener/machine-controller-manager/pull/482 It has some ideas regarding making maxReplacement configurable

rishabh-11 avatar Sep 20 '22 09:09 rishabh-11

Grooming

  • We currently transition un-healthy machines in unknown phase to failed phase by MaxReplacement=1 in every reconcile cycle.
  • We propose making the MaxReplacements configurable parameter for the machine controller.
    • This can be a static or percentage based value. (semantics to be designed later)
  • We should relook and re-factor the current PermitGiver logic and canMarkMachineFailed logic which does excessive work in the reconcile health check cycle and obviates code clarity.

elankath avatar Feb 23 '23 10:02 elankath