clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

CHK StatefulSets cannot change storage classes due to missing volume claims

Open wilkermichael opened this issue 4 months ago • 2 comments

CHK creates StatefulSets differently from CHI, making it very difficult to change storage classes. Unlike the ClickHouse (CHI) operator which allows for manual intervention through deletion of the PVC and POD, allowing the operator to reconcile, CHK STS store only the PVC name and not the full volume claim, which appears to lock the keeper pods on restart into a pending state.

CHK Reproduction

  1. Create CHK with storageClassName: ssd
  2. Update CHK to storageClassName: ssd2
  3. Delete PVC and pod to force recreation:
  4. Result: Pod stuck in Pending. PVC never recreated.

CHI Reproduction

  1. Create CHI with storageClassName: ssd
  2. Update CHI to storageClassName: ssd2
  3. Delete PVC and pod to force recreation:
  4. Result: Pod starts up and new PVC is created

Current CHK work around

  1. Delete the clickhouse-operator deployment
  2. Deploy the storage class update
  3. Locally, create new yaml files for all PVCs you will need to be replacing
  4. Delete the PVC of the pod you want to flip over
  5. Once the PVC is deleted (after the STS rolls the pod) create the new PVC from the local .yaml file (apply -f $FILE in kubectl)
  6. Repeat steps 4-6 for all other pods
  7. Redeploy the operator, which should do nothing

Ideally we'd like some way to configure the operator to recreate the PVC when a storage class is updated, but at the very least it would be good to have similar behaviour to CHI which allows for easier intervention.

wilkermichael avatar Sep 09 '25 19:09 wilkermichael

@wilkermichael, we've run into the same issue where the operator doesn't recreate the PVCs (our ArgoCD likes to prune them since it sees them as extra, unmanaged resources), but we've been able to find a reliable workaround by patching the CHK CRD's .spec.taskID field like so:

kubectl --namespace clickhouse patch chk clickhouse --type='merge' \
    -p='{"spec":{"taskID":"force-reconcile-'$(date +%s)'"}}'

We've never tried this workaround in conjunction with updating the CHK's storage class, but you can see if the command works for your scenario.

fkywong avatar Sep 16 '25 22:09 fkywong

Thanks @fkywong we actually have been using the taskID work around, the CHK problem I mentioned still occurs unfortunately.

wilkermichael avatar Sep 17 '25 14:09 wilkermichael