machine-controller-manager icon indicating copy to clipboard operation
machine-controller-manager copied to clipboard

Recursive function call leading to calling API server unlimited number of times without any delay/backoff

Open unmarshall opened this issue 3 years ago • 0 comments

How to categorize this issue? /area robustness control-plane ops-productivity /kind bug /priority 3

What happened: During a code review it was found that there is a recursive call made to API server to update a MachineDeployment resource.

func (dc *controller) updateMachineDeploymentFinalizers(ctx context.Context, machineDeployment *v1alpha1.MachineDeployment, finalizers []string) {
	// Get the latest version of the machineDeployment so that we can avoid conflicts
	machineDeployment, err := dc.controlMachineClient.MachineDeployments(machineDeployment.Namespace).Get(ctx, machineDeployment.Name, metav1.GetOptions{})
	if err != nil {
		return
	}

	clone := machineDeployment.DeepCopy()
	clone.Finalizers = finalizers
	_, err = dc.controlMachineClient.MachineDeployments(machineDeployment.Namespace).Update(ctx, clone, metav1.UpdateOptions{})
	if err != nil {
		// Keep retrying until update goes through
		klog.Warning("Updated failed, retrying")
		dc.updateMachineDeploymentFinalizers(ctx, machineDeployment, finalizers)
	}
}

This will cause massive load on the API server. Even with Kube API server's API Priority and Fairness this could cause delays or denial of service for other controllers having the same priority or less.

What you expected to happen: The re-attempt should be done with an exponential backoff. The MachineDeployment should be re-queued to be tried later since the update of a finalizer is important and should be done prior to the actual provisioning of the machine.

How to reproduce it (as minimally and precisely as possible):

Environment: MCM v0.43.0 master

unmarshall avatar Mar 07 '22 04:03 unmarshall