Graceful shutdown not handled correctly for long-running reconciliations
There seems to be a problem with graceful termination handling in the situation when the helm-controller workers are busy reconciling a release that takes longer than 30s (readinessProbe.failureThreshold: 3 and readinessProbe.periodSeconds: 10 as configured by default) the readiness probe fails immediately after SIGTERM (I see that in the pod events) and then the container receives another SIGTERM which triggers the signal handler to exit immediately with code 1).
However, this doesn't look like something we can fix in helm-controller itself but rather an issue in the controller-runtime logic:
- The
internalProceduresStopchannel is closed here (before the runnables are stopped): https://github.com/kubernetes-sigs/controller-runtime/blob/v0.13.1/pkg/manager/internal.go#L539 - The probes server is designed to shutdown when the internalProceduresStop channel is closed: https://github.com/kubernetes-sigs/controller-runtime/blob/v0.13.1/pkg/manager/internal.go#L384
Removing the readiness probe will solve only part of the problem because we also need to override the default controller manager gracefulShutdownTimeout (30s).