crate-operator icon indicating copy to clipboard operation
crate-operator copied to clipboard

Pods are not restarted on update - label issue

Open paepke opened this issue 11 months ago • 3 comments

The crate operator does not find pods to restart as the app.kubernetes.io/managed-by label copied over from the cratedb resource is not overwritten. We are using helm to deploy a cratedb definition to the cluster and as of Helm 3.2 (quite old) it adds automatically the app.kubernetes.io/managed-by label with the value Helm to it. This ends up in the StatefulSet which gets initially deployed. On an update of the cratedb version attribute the StatefulSet gets updated, but the operator cannot find the pods to restart as the value of the managed-by label is not set to crate-operator. It looks like in the following snippet that label gets overwritten by the one from the cratedb resource. https://github.com/crate/crate-operator/blob/ae24778f403618d7dfce30fa7874d4b43d548fe1/crate/operator/handlers/handle_create_cratedb.py#L61-L68

Also the lookup of the pods are defined in the following snippet validates my assumption https://github.com/crate/crate-operator/blob/ae24778f403618d7dfce30fa7874d4b43d548fe1/crate/operator/operations.py#L139-L157

I'm guessing the sequence should be turned around that the StatefulSet managed-by label should have the correct value of crate-operator.

Steps to reproduce

  1. Create a cratedb resource with the label app.kubernetes.io/manged-by: foobar
  2. a statefulset with that label will be created by by the operator
  3. update the version of the cratedb created in step1
  4. StatefulSet gets updated, but the pods were not restarted

paepke avatar Mar 13 '25 16:03 paepke

Thanks for reporting this issue and for the detailed analysis!

You are right, the app.kubernetes.io/managed-by label should always be set to crate-operator in the STS, regardless of what is specified in the CrateDB CRD. We will discuss this internally and likely adjust the label merging logic to enforce this behavior.

In general we do not recommend managing the CrateDB CRD with Helm. The operator maintains some internal state (e.g. in the .status field) which may be overwritten or reset by Helm during upgrades or diffs. This can cause unexpected behavior between the intended and actual state of the cluster.

tomach avatar Jun 24 '25 08:06 tomach

Thank you very much for the feedback. We are not setting or recreate the custom resource via helm so I think we are safe and have not seen any problem with it yet. Anyway we will take it into account and investigate if its usable to change the deployment. The current deployment is not that easy to change nevertheless. So we are looking forward for the change you are considering.

paepke avatar Jun 26 '25 18:06 paepke

Hi again!

The fix for this issue has been released as part of crate-operator version 2.49.0. Please upgrade to this version, and let us know if you encounter any further issues.

Cheers!

tomach avatar Jul 16 '25 08:07 tomach