operator-lifecycle-manager icon indicating copy to clipboard operation
operator-lifecycle-manager copied to clipboard

Kubectl rollout restart of a deployment managed by OLM doesn't work as expected

Open itroyano opened this issue 1 year ago • 2 comments

Bug Report

What did you do?

Given a controller manager Deployment managed by a CSV, attempting a kubectl rollout restart on the controller manager produces an unexpected result.

What did you expect to see?

A new Replica Set with a new set of pods comes up, replacing the existing RS.

What did you see instead? Under which circumstances?

A new RS is created but it gets immediately scaled down and the old RS takes over instead. For example:

➜ oc describe deploy
Name:                   quay-operator-tng
Namespace:              default
CreationTimestamp:      Wed, 28 Aug 2024 09:16:06 +0200
Labels:                 olm.deployment-spec-hash=5UgbKi05MO7Ei5ZQssoGmupLx4sNY2p8bWGNDS
                        olm.managed=true
                        olm.owner=quay-operator.v3.8.13
                        olm.owner.kind=ClusterServiceVersion
                        olm.owner.namespace=default
                        operators.coreos.com/project-quay.default=
Annotations:            deployment.kubernetes.io/revision: 8
Selector:               name=quay-operator-alm-owned
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
.......
.......
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  quay-operator-tng-684667795f (0/0 replicas created)
NewReplicaSet:   quay-operator-tng-78f9489957 (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  32h   deployment-controller  Scaled up replica set quay-operator-tng-5ff7b59799 to 1
  Normal  ScalingReplicaSet  32h   deployment-controller  Scaled up replica set quay-operator-tng-78f9489957 to 1
  Normal  ScalingReplicaSet  32h   deployment-controller  Scaled down replica set quay-operator-tng-5ff7b59799 to 0 from 1
  Normal  ScalingReplicaSet  32h   deployment-controller  Scaled up replica set quay-operator-tng-6b9c5fc95b to 1
  Normal  ScalingReplicaSet  32h   deployment-controller  Scaled down replica set quay-operator-tng-6b9c5fc95b to 0 from 1
  Normal  ScalingReplicaSet  31h   deployment-controller  Scaled up replica set quay-operator-tng-f8bc859f5 to 1
  Normal  ScalingReplicaSet  31h   deployment-controller  Scaled down replica set quay-operator-tng-f8bc859f5 to 0 from 1
  Normal  ScalingReplicaSet  31h   deployment-controller  Scaled up replica set quay-operator-tng-684667795f to 1
  Normal  ScalingReplicaSet  31h   deployment-controller  Scaled down replica set quay-operator-tng-684667795f to 0 from 1

The reason for this, is OLM reverts the annotation kubectl.kubernetes.io/restartedAt placed on the Deployment by the rollout restart action.

Possible Solution

A 3-way merge patch here might avoid overriding fields we don't care about - https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/api/wrappers/deployment_install_client.go#L124

itroyano avatar Sep 09 '24 06:09 itroyano

Issues go stale after 90 days of inactivity. If there is no further activity, the issue will be closed in another 30 days.

github-actions[bot] avatar May 28 '25 01:05 github-actions[bot]

https://github.com/argoproj/argo-cd/discussions/12410