CSV stuck in install loop with auth-delegator already exists error
Bug Report
What did you do? Upgrade OLM
What did you expect to see? OLM picks up already installed operators
What did you see instead? Under which circumstances?
CSV stuck in retry loop with failed: clusterrolebindings.rbac.authorization.k8s.io "<some-operator>:auth-delegator" already exists
example log:
{"level":"error","ts":"2024-03-11T09:23:18Z","logger":"controllers.operator","msg":"Could not update Operator status","request":{"name":"cert-manager.cert-manager"},"error":"Operation cannot be fulfilled on operators.operators.coreos.com \"cert-manager.cert-manager\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators.(*OperatorReconciler).Reconcile\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/pkg/controller/operators/operator_controller.go:157\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
time="2024-03-11T09:23:19Z" level=warning msg="needs reinstall: webhooks not installed" csv=cert-manager.v1.14.2 id=22GeL namespace=cert-manager phase=Failed strategy=deployment
I0311 09:23:19.150921 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"cert-manager", Name:"cert-manager.v1.14.2", UID:"2ff6cc43-eacb-42cd-9607-3f58f7a8a00a", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"905345183", FieldPath:""}): type: 'Normal' reason: 'NeedsReinstall' webhooks not installed
time="2024-03-11T09:23:19Z" level=info msg="scheduling ClusterServiceVersion for install" csv=cert-manager.v1.14.2 id=a47Rd namespace=cert-manager phase=Pending
I0311 09:23:19.958772 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"cert-manager", Name:"cert-manager.v1.14.2", UID:"2ff6cc43-eacb-42cd-9607-3f58f7a8a00a", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"905345206", FieldPath:""}): type: 'Normal' reason: 'AllRequirementsMet' all requirements found, attempting install
time="2024-03-11T09:23:20Z" level=info msg="No api or webhook descs to add CA to"
time="2024-03-11T09:23:20Z" level=info msg="No api or webhook descs to add CA to"
time="2024-03-11T09:23:20Z" level=warning msg="reusing existing cert cert-manager-webhook-service-cert"
time="2024-03-11T09:23:20Z" level=warning msg="could not create auth delegator clusterrolebinding cert-manager-webhook-service-system:auth-delegator"
I0311 09:23:20.385386 1 event.go:298] Event(v1.ObjectReference{Kind:"ClusterServiceVersion", Namespace:"cert-manager", Name:"cert-manager.v1.14.2", UID:"2ff6cc43-eacb-42cd-9607-3f58f7a8a00a", APIVersion:"operators.coreos.com/v1alpha1", ResourceVersion:"905345224", FieldPath:""}): type: 'Warning' reason: 'InstallComponentFailed' install strategy failed: clusterrolebindings.rbac.authorization.k8s.io "cert-manager-webhook-service-system:auth-delegator" already exists
{"level":"error","ts":"2024-03-11T09:23:20Z","logger":"controllers.operator","msg":"Could not update Operator status","request":{"name":"cert-manager.cert-manager"},"error":"Operation cannot be fulfilled on operators.operators.coreos.com \"cert-manager.cert-manager\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/operator-framework/operator-lifecycle-manager/pkg/controller/operators.(*OperatorReconciler).Reconcile\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/pkg/controller/operators/operator_controller.go:157\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
E0311 09:23:20.578578 1 queueinformer_operator.go:319] sync {"update" "cert-manager/cert-manager.v1.14.2"} failed: clusterrolebindings.rbac.authorization.k8s.io "cert-manager-webhook-service-system:auth-delegator" already exists
Environment
- operator-lifecycle-manager version:
v0.27
- Kubernetes version information:
PS \> kubectl version
Client Version: v1.28.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.7-gke.1026000
same error on
$ oc version
Client Version: 4.14.0-202401111553.p0.g286cfa5.assembly.stream-286cfa5
Kustomize Version: v5.0.1
Server Version: 4.15.0-0.okd-2024-01-27-070424
Kubernetes Version: v1.28.2-3568+0fb47263bee9d4-dirty
- Kubernetes cluster kind: GKE/OKD
Possible Solution
Revert to v0.25.0 (kubectl apply -f https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.25.0/olm.yaml --server-side --force-conflicts). seems like v0.26 and v0.27 are both broken
OR
Add olm.managed=true label to the existing resource (kubectl get rolebinding -n kube-system -o name | grep auth-reader | xargs -I {} kubectl label -n kube-system {} olm.managed=true)
Additional context The catalog source have high cpu usage and timeouts. I think this is due to the constant retry.
Issues go stale after 90 days of inactivity. If there is no further activity, the issue will be closed in another 30 days.
This issue has been closed due to inactivity.