clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

PVCs not deleted with "reclaimPolicy" set to "Delete"

Open dkmorgan opened this issue 9 months ago • 6 comments

I currently have a ClickHouse cluster running on AWS. The cluster configuration was based on the following examples:

The cluster runs fine and I can create a database, etc. When I delete the "ClickHouseInstallation" and "ClickHouseKeeperInstallation" from the cluster, I very often get PVC leftovers together with the PV and consequently the EBS volume despite setting reclaimPolicy to "Delete".

The volumeClaimTemplates section is as follows (same for both server and keeper except the name):

    volumeClaimTemplates:
      - name: ch-server-vct
        reclaimPolicy: Delete
        spec:
          storageClassName: ebs-gp3-delete
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 50Gi

The StorageClass definition for "ebs-gp3-delete" is as follows:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3-delete
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

I haven't been able to figure out why I am getting these random leftover PVCs. Has anyone run into a similar situation? I would greatly appreciate any pointers that may be causing this.

dkmorgan avatar Apr 28 '25 10:04 dkmorgan

Been comparing the PVC's created against a PVC created when deploying a RabbitmqCluster and may have found what could be leading to the issue. The PVC's created by the clickhouse-operator does not set the ownerRefs:

kubectl get pvc -o json | jq '.items[] | {name: .metadata.name, ownerRefs: .metadata.ownerReferences}'
{
  "name": "ch-keeper-vct-chk-ch-keeper-0-0-0",
  "ownerRefs": null
}
{
  "name": "ch-keeper-vct-chk-ch-keeper-0-1-0",
  "ownerRefs": null
}
{
  "name": "ch-keeper-vct-chk-ch-keeper-0-2-0",
  "ownerRefs": null
}
{
  "name": "ch-server-vct-chi-ch-server-0-0-0",
  "ownerRefs": null
}
{
  "name": "ch-server-vct-chi-ch-server-0-1-0",
  "ownerRefs": null
}
{
  "name": "persistence-rabbitmq-cluster-server-0",
  "ownerRefs": [
    {
      "apiVersion": "rabbitmq.com/v1beta1",
      "blockOwnerDeletion": false,
      "controller": true,
      "kind": "RabbitmqCluster",
      "name": "rabbitmq-cluster",
      "uid": "***"
    }
  ]
}

Is this behavior expected? Or have I missed something that is preventing the ownerRefs to be set?

dkmorgan avatar May 28 '25 09:05 dkmorgan

@dkmorgan do you by any chance use Operator as the provisioner:

  defaults:
    storageManagement:
      provisioner: Operator

(im not suggesting this is the cause, just curious if there's any correlation)

janeklb avatar May 29 '25 11:05 janeklb

@janeklb

No, I haven't specified the Operator as the provisioner. Here's a copy of my entire ClickHouseKeeperInstallation manifest generated using helm template (I'm using a range loop to generate the replicas and podTemplates)

apiVersion: clickhouse-keeper.altinity.com/v1
kind: ClickHouseKeeperInstallation
metadata:
  name: ch-keeper
spec:
  configuration:
    clusters:
      - name: dev-cluster
        templates:
          dataVolumeClaimTemplate: ch-keeper-vct
        layout:
          replicas:
            - templates:
                podTemplate: ch-keeper-1a
            - templates:
                podTemplate: ch-keeper-1c
            - templates:
                podTemplate: ch-keeper-1d
    settings:
      logger/level: "information"
      logger/console: "true"
  templates:
    podTemplates:
      - name: ch-keeper-1a
        zone:
          values:
            - ap-northeast-1a
        spec:
          containers:
            - name: clickhouse-keeper
              image: "altinity/clickhouse-keeper:24.3.12.76.altinitystable"
              imagePullPolicy: IfNotPresent
              resources:
                limits:
                  cpu: 2
                  memory: 4Gi
                requests:
                  cpu: 1
                  memory: 256M
              volumeMounts:
                - name: ch-keeper-vct
                  mountPath: /var/lib/clickhouse-keeper
          nodeSelector:
            node.kubernetes.io/instance-type: t3.medium
      - name: ch-keeper-1c
        zone:
          values:
            - ap-northeast-1c
        spec:
          containers:
            - name: clickhouse-keeper
              image: "altinity/clickhouse-keeper:24.3.12.76.altinitystable"
              imagePullPolicy: IfNotPresent
              resources:
                limits:
                  cpu: 2
                  memory: 4Gi
                requests:
                  cpu: 1
                  memory: 256M
              volumeMounts:
                - name: ch-keeper-vct
                  mountPath: /var/lib/clickhouse-keeper
          nodeSelector:
            node.kubernetes.io/instance-type: t3.medium
      - name: ch-keeper-1d
        zone:
          values:
            - ap-northeast-1d
        spec:
          containers:
            - name: clickhouse-keeper
              image: "altinity/clickhouse-keeper:24.3.12.76.altinitystable"
              imagePullPolicy: IfNotPresent
              resources:
                limits:
                  cpu: 2
                  memory: 4Gi
                requests:
                  cpu: 1
                  memory: 256M
              volumeMounts:
                - name: ch-keeper-vct
                  mountPath: /var/lib/clickhouse-keeper
          nodeSelector:
            node.kubernetes.io/instance-type: t3.medium
    volumeClaimTemplates:
      - name: ch-keeper-vct
        reclaimPolicy: Delete
        spec:
          storageClassName: ebs-gp3-delete
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 30Gi

dkmorgan avatar May 30 '25 11:05 dkmorgan

I think it is a bug with operator, function of reclaimPolicy retain and delete is jumbled

manishrawat1992 avatar Jun 05 '25 13:06 manishrawat1992

We're also seeing this issue and are using quite large EBS volumes which we'd prefer get cleaned up automatically by the operator (as advertised). Setting reclaimPolicy to Delete seems to have no effect.

zbialik avatar Aug 12 '25 18:08 zbialik

It appears that the StatefulSets provisioned by the ClickhouseInstallation are created with:

spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain

This is why, on delete of ClickhouseInstallation, the PVCs still live on. I think the clickhouse-operator should make use of the newer feature for Statefulset.spec.persistentVolumeClaimRetentionPolicy (documented here) so that the handling of the PVCs can be customized.

zbialik avatar Aug 12 '25 19:08 zbialik