postgres-operator icon indicating copy to clipboard operation
postgres-operator copied to clipboard

Fix PVC retention

Open dmotte opened this issue 1 year ago • 4 comments

As it is now, the persistent_volume_claim_retention_policy feature doesn't work properly: the PVCs are always deleted (not by Kubernetes, but by the operator itself).

Evidence:

time="2024-03-07T12:46:32Z" level=debug msg="pods have been deleted" cluster-name=default/mypgcluster pkg=cluster worker=0
time="2024-03-07T12:46:32Z" level=debug msg="deleting PVCs" cluster-name=default/mypgcluster pkg=cluster worker=0
time="2024-03-07T12:46:32Z" level=debug msg="deleting PVC \"default/pgdata-mypgcluster-0\"" cluster-name=default/mypgcluster pkg=cluster worker=0
time="2024-03-07T12:46:32Z" level=debug msg="deleting PVC \"default/pgdata-mypgcluster-1\"" cluster-name=default/mypgcluster pkg=cluster worker=0
time="2024-03-07T12:46:32Z" level=debug msg="deleting PVC \"default/pgdata-mypgcluster-2\"" cluster-name=default/mypgcluster pkg=cluster worker=0
time="2024-03-07T12:46:32Z" level=debug msg="PVCs have been deleted" cluster-name=default/mypgcluster pkg=cluster worker=0

And this was my config:

$ helm get values zalando-postgres-operator
USER-SUPPLIED VALUES:
...
configKubernetes:
  persistent_volume_claim_retention_policy:
    when_deleted: retain
    when_scaled: retain
...

This PR aims to fix this issue, by introducing a check before the call to the deletePersistentVolumeClaims function.

dmotte avatar Mar 07 '24 14:03 dmotte

isn't it all about the volumes? When you have the setting retain, one can delete the PVC, but the volume goes into Released state.

FxKu avatar Mar 07 '24 16:03 FxKu

@FxKu yep, but only if the ReclaimPolicy of the PV is set to Retain.

In any case, the feature I'm referring to is named persistent_volume_claim_retention_policy, so it should be related to PVCs, not PVs.

In short, in the current state, persistent_volume_claim_retention_policy has literally no effect, because the PVCs are always deleted by the operator, even when they are preserved by Kubernetes. This PR is about "making the operator retain them, when they are supposed to be retained" :wink:

dmotte avatar Mar 07 '24 18:03 dmotte

@dmotte after reading up on this a little more I better understand the difference between persistentVolumeClaimRetentionPolicy of StatefulSet and the persistentVolumeReclaimPolicy of PVs. I would think the retention policy is designed for cases where somebody maybe accidentally deletes the StatefulSet. So we make sure that the volumes are not affected.

However, when somebody removes the Postgres cluster we want the operator to cleanup all child resources. If you want to prevent this we better introduce another config option to toggle this behavior. I found out that we once had a PR for this, see #1074. But it had quite a few issues. Maybe you can give a try? Or I will see if I find time for this soon.

Edit: Ok, let me quickly create the PR. Shouldn't take me long.

FxKu avatar Mar 12 '24 08:03 FxKu

Great! Thank you @FxKu

I tried to take a look at the PR you mentioned, but unfortunately it's not quite clear to me what the problems actually are. Let me know if you need some help and what can I do

dmotte avatar Mar 12 '24 17:03 dmotte

Just for info: I've also been able to solve my problem in a different way. I'm posting the solution here because it may be helpful to someone:

If you accidentally deleted your Postgres cluster but you still have your PersistentVolumes around (for example because their reclaimPolicy was set to Retain), then you can still restore your Postgres cluster by re-attaching the PVs to a new cluster.

  1. First of all, check that your PVs are still there, with the kubectl get pv command
  2. You will see that, after cluster deletion, their STATUS is now Released. We need to make them Available again, so the operator will be able to bind them to the new PersistentVolumeClaims it will create
  3. To do that, you have to manually patch the PVs, by running this command for each of them (replace the PV name accordingly):
    kubectl patch pv/my-pv-name -p '{"spec":{"claimRef": {"resourceVersion":null,"uid":null} }}'
    
  4. Then, using the kubectl get pv command again, make sure the STATUS of the PVs is now Available
  5. Finally, you can re-deploy the same cluster manifest (postgresql Kubernetes resource) that you accidentally deleted before. The Zalando Postgres Operator will create new PVCs and Kubernetes will attach them to the already-existing PVs

dmotte avatar Mar 14 '24 17:03 dmotte