helm-operator fails to annotate some resources
Bug Report
Helm-operator fails to annotate some resources meaning that chart updates will fail.
Description
I've created an helm-operator to deploy nginx ingress controller. The first version of this operator was created using operator-sdk version 1.24.
The command to create the operator was as follows:
operator-sdk init \
--plugins helm \
--helm-chart ingress-nginx \
--helm-chart-repo https://kubernetes.github.io/ingress-nginx \
--helm-chart-version 4.0.3 \
--domain helm.k8s.io \
--group charts \
--version v1 \
--kind NginxIngressController
Now I updated operator-sdk to version 1.29 and updated the ingress-nginx to version 4.6.1
operator-sdk init \
--plugins helm \
--helm-chart ingress-nginx \
--helm-chart-repo https://kubernetes.github.io/ingress-nginx \
--helm-chart-version 4.6.1 \
--domain helm.k8s.io \
--group charts \
--version v1 \
--kind NginxIngressController
When I try to upgrade the first version of the operator to the second one everything seems to work except that the ingress controller never gets updated, giving the following error while the operator tried to reconcile:
failed to get candidate release: rendered manifests contain a resource that already exists. Unable to continue with update: HorizontalPodAutoscaler "nina-annotation-controller" in namespace "ingress-controller-operator" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "nina-annotation"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "ingress-controller-operator"
So after investigating the issue it seems that the operator doesn't annotate the HorizontalPodAutoscaler resource with
metadata:
annotations:
meta.helm.sh/release-name: nina-annotation
meta.helm.sh/release-namespace: ingress-controller-operator
while for example the Deployment resource gets annotated.
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
meta.helm.sh/release-name: nina-annotation
meta.helm.sh/release-namespace: ingress-controller-operator
creationTimestamp: "2023-06-20T09:33:30Z"
I've discovered this issue of missing annotations in HorizontalPodAutoscaler but might be happening with other resources.
Workaround to minimize bug impact
So what was happening invalidated upgrade of the operator, the only way to bypass the issue and be able to upgrade operator correctly and the ingress controllers as well was to disable autoscaling in my Custom Resource before updating the controller and only after everything getting updated as it should I enabled autoscaling again.
Environment
minikube
minikube version: v1.30.1
commit: 08896fd1dc362c097c925146c4a0d0dac715ace0
minikube setup with
minikube start --cpus 4 --driver=docker --addons ingress --addons ingress-dns --addons metrics-server --kubernetes-version=1.24.8
operator-sdk
operator-sdk version: "v1.29.0", commit: "78c564319585c0c348d1d7d9bbfeed1098fab006", kubernetes version: "1.26.0", go version: "go1.19.9", GOOS: "darwin", GOARCH: "arm64"
We also observe the same problem when we upgrade operator-sdk from version v1.22 to v1.28. We find this issue happens sometimes. We believe it could be an issue involved in the newer version of operator-sdk.
I am considering if we can revert the operator-sdk version could be a fix of this problem.
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.
If this issue is safe to close now please do so with /close.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.
/close
@openshift-bot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting
/reopen. Mark the issue as fresh by commenting/remove-lifecycle rotten. Exclude this issue from closing again by commenting/lifecycle frozen./close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.