operator-sdk helm-operator fails to annotate some resources

Bug Report

Helm-operator fails to annotate some resources meaning that chart updates will fail.

Description

I've created an helm-operator to deploy nginx ingress controller. The first version of this operator was created using operator-sdk version 1.24. The command to create the operator was as follows:

operator-sdk init \
  --plugins helm \
  --helm-chart ingress-nginx \
  --helm-chart-repo https://kubernetes.github.io/ingress-nginx \
  --helm-chart-version 4.0.3 \
  --domain helm.k8s.io \
  --group charts \
  --version v1 \
  --kind NginxIngressController

Now I updated operator-sdk to version 1.29 and updated the ingress-nginx to version 4.6.1

operator-sdk init \
  --plugins helm \
  --helm-chart ingress-nginx \
  --helm-chart-repo https://kubernetes.github.io/ingress-nginx \
  --helm-chart-version 4.6.1 \
  --domain helm.k8s.io \
  --group charts \
  --version v1 \
  --kind NginxIngressController

When I try to upgrade the first version of the operator to the second one everything seems to work except that the ingress controller never gets updated, giving the following error while the operator tried to reconcile:

failed to get candidate release: rendered manifests contain a resource that already exists. Unable to continue with update: HorizontalPodAutoscaler "nina-annotation-controller" in namespace "ingress-controller-operator" exists and cannot be imported into the current release: invalid ownership metadata; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "nina-annotation"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "ingress-controller-operator"

So after investigating the issue it seems that the operator doesn't annotate the HorizontalPodAutoscaler resource with

metadata:
  annotations:
    meta.helm.sh/release-name: nina-annotation
    meta.helm.sh/release-namespace: ingress-controller-operator

while for example the Deployment resource gets annotated.

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
    meta.helm.sh/release-name: nina-annotation
    meta.helm.sh/release-namespace: ingress-controller-operator
  creationTimestamp: "2023-06-20T09:33:30Z"

I've discovered this issue of missing annotations in HorizontalPodAutoscaler but might be happening with other resources.

Workaround to minimize bug impact

So what was happening invalidated upgrade of the operator, the only way to bypass the issue and be able to upgrade operator correctly and the ingress controllers as well was to disable autoscaling in my Custom Resource before updating the controller and only after everything getting updated as it should I enabled autoscaling again.

Environment

minikube

minikube version: v1.30.1
commit: 08896fd1dc362c097c925146c4a0d0dac715ace0

minikube setup with

minikube start --cpus 4 --driver=docker --addons ingress --addons ingress-dns --addons metrics-server --kubernetes-version=1.24.8

operator-sdk

operator-sdk version: "v1.29.0", commit: "78c564319585c0c348d1d7d9bbfeed1098fab006", kubernetes version: "1.26.0", go version: "go1.19.9", GOOS: "darwin", GOARCH: "arm64"

Jun 22 '23 17:06 lufinima

We also observe the same problem when we upgrade operator-sdk from version v1.22 to v1.28. We find this issue happens sometimes. We believe it could be an issue involved in the newer version of operator-sdk.

I am considering if we can revert the operator-sdk version could be a fix of this problem.

Jul 05 '23 18:07 horis233

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Oct 04 '23 01:10 openshift-bot

/remove-lifecycle stale

Oct 04 '23 08:10 lufinima

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Jan 02 '24 09:01 openshift-bot

/remove-lifecycle stale

Jan 19 '24 11:01 lufinima

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Apr 19 '24 01:04 openshift-bot

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

May 19 '24 08:05 openshift-bot

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Jun 19 '24 00:06 openshift-bot

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Jun 19 '24 00:06 openshift-ci[bot]