clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

Operator Does not Render the Correct Keeper Statefulset Volumes for Secrets

Open abe-mused opened this issue 9 months ago • 2 comments

I have this ClickHouse Keeper installation [1], I'm seeing an issue with keeper Statefulsets failing to create keeper pods because the pod template has some volumes of type secret with no name (and those volumes are marked with optional: false). It surfaces as this event [2] on the Statefulsets, with no pods created. I'm adding those secrets in the .spec.configuration.files of my ClickHouseKeeperInstallation just like this example, but when I look at the -o yaml of the keeper Statefulsets that's generated by the operator I see this [3]. Am I doing something wrong?

Notes:

  • I'm using operator version 0.24.5
  • This is hosted on EKS with K8s version 1.31
  • The secrets exist and are being used by the clickhouse pods in this ClickHouseInstallation [4] with no issues.

[1] ClickHouseKeeperInstallation manifest:

apiVersion: 'clickhouse-keeper.altinity.com/v1'
kind: 'ClickHouseKeeperInstallation'
metadata:
  name: 'keeper'
  namespace: 'clickhouse'
spec:
  configuration:
    settings:
      prometheus/endpoint: '/metrics'
      prometheus/port: '9363'
      prometheus/metrics: 'true'
      prometheus/events: 'true'
      prometheus/asynchronous_metrics: 'true'
      prometheus/errors: 'true'
      prometheus/status_info: 'true'
    files:
      keeper_config.xml: |
        <clickhouse>
            <keeper_server>
              <tcp_port_secure>9281</tcp_port_secure>
              <raft_configuration>
                <secure>true</secure>
              </raft_configuration>
            </keeper_server>
        </clickhouse>
      openssl_server.xml: |
        <clickhouse>
          <openSSL>
            <server>
              <certificateFile>/etc/clickhouse-server/secrets.d/tls.crt/clickhouse-certs/tls.crt</certificateFile>
              <privateKeyFile>/etc/clickhouse-server/secrets.d/tls.key/clickhouse-certs/tls.key</privateKeyFile>
              <caConfig>/etc/clickhouse-server/secrets.d/ca.crt/clickhouse-certs/ca.crt</caConfig>
              <verificationMode>relaxed</verificationMode>
              <loadDefaultCAFile>true</loadDefaultCAFile>
              <cacheSessions>true</cacheSessions>
              <disableProtocols>sslv2,sslv3</disableProtocols>
              <preferServerCiphers>true</preferServerCiphers>
              <dhParamsFile remove="remove">/etc/clickhouse-keeper/dhparam.pem</dhParamsFile>
            </server>
          </openSSL>
        </clickhouse>
      openssl_client.xml: |
        <clickhouse>
          <openSSL>
            <client>
              <loadDefaultCAFile>true</loadDefaultCAFile>
              <caConfig>/etc/clickhouse-server/secrets.d/ca.crt/clickhouse-certs/ca.crt</caConfig>
              <cacheSessions>true</cacheSessions>
              <disableProtocols>sslv2,sslv3</disableProtocols>
              <preferServerCiphers>true</preferServerCiphers>
              <invalidCertificateHandler>
                  <name>RejectCertificateHandler</name>
              </invalidCertificateHandler>
            </client>
          </openSSL>
        </clickhouse>
      tls.crt:
        valueFrom:
          secretKeyRef:
            name: clickhouse-certs
            key: tls.crt
      tls.key:
        valueFrom:
          secretKeyRef:
            name: clickhouse-certs
            key: tls.key
      ca.crt:
        valueFrom:
          secretKeyRef:
            name: clickhouse-certs
            key: ca.crt
    clusters:
      - name: 'main-ensemble'
        templates:
          dataVolumeClaimTemplate: data-volume-template
          logVolumeClaimTemplate: log-volume-template
        layout:
          replicas:
            - templates:
                podTemplate: clickhouse-keeper-pod-template
            - templates:
                podTemplate: clickhouse-keeper-pod-template
            - templates:
                podTemplate: clickhouse-keeper-pod-template
  templates:
    podTemplates:
      - name: clickhouse-keeper-pod-template
        metadata:
          annotations:
            prometheus.io/scrape: 'true'
        spec:
          containers:
            - name: clickhouse-keeper
              image: clickhouse/clickhouse-keeper:23.12.5.81
              ports:
                - name: metrics
                  containerPort: 9363
    volumeClaimTemplates:
      - name: data-volume-template
        spec:
          storageClassName: keeper-data
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 25Gi
      - name: log-volume-template
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi

[2] Event of one of the keeper statefulsets that failed to create a pod:

Events:                                                                                                                                                                                                                                 │
│   Type     Reason        Age                 From                    Message                                                                                                                                                            │
│   ----     ------        ----                ----                    -------                                                                                                                                                            │
│   Warning  FailedCreate  88s (x19 over 23m)  statefulset-controller  create Pod chk-keeper-main-ensemble-0-0-0 in StatefulSet chk-keeper-main-ensemble-0-0 failed error: Pod "chk-keeper-main-ensemble-0-0-0" is invalid: [spec.volumes │
│ 5[].secret.secretName: Required value, spec.volumes[5].secret.items[0].key: Required value, spec.volumes[5].secret.items[0].path: Required value, spec.volumes[6].secret.secretName: Required value, spec.volumes[6].secret.items[0].ke │
│ y: Required value, spec.volumes[6].secret.items[0].path: Required value, spec.volumes[7].secret.secretName: Required value, spec.volumes[7].secret.items[0].key: Required value, spec.volumes[7].secret.items[0].path: Required value,  │
│ spec.containers[0].volumeMounts[3].name: Not found: "tlscrt", spec.containers[0].volumeMounts[4].name: Not found: "tlskey", spec.containers[0].volumeMounts[5].name: Not found: "cacrt"]

[3] the volumes section of the statefulsets having no key or path even tho they were specified in the ClickHouseKeeperInstallation:

spec.template.spec.volumes:
      ...
      - name: tlskey
        secret:
          defaultMode: 420
          items:
          - key: ""
            path: ""
      - name: cacrt
        secret:
          defaultMode: 420
          items:
          - key: ""
            path: ""
      - name: tlscrt
        secret:
          defaultMode: 420
          items:
          - key: ""
            path: ""

[4] The ClickHouseInstallation manifest that is successfully using the same secrets:

apiVersion: 'clickhouse.altinity.com/v1'
kind: 'ClickHouseInstallation'
metadata:
  name: 'ch'
  namespace: clickhouse
spec:
  configuration:
    settings:
      prometheus/endpoint: '/metrics'
      prometheus/port: '9363'
      prometheus/metrics: 'true'
      prometheus/events: 'true'
      prometheus/asynchronous_metrics: 'true'
      prometheus/errors: 'true'
      prometheus/status_info: 'true'
      ...

    files:
      openssl_server.xml: |
        <clickhouse>
          <openSSL>
            <server>
              <certificateFile>/etc/clickhouse-server/secrets.d/tls.crt/clickhouse-certs/tls.crt</certificateFile>
              <privateKeyFile>/etc/clickhouse-server/secrets.d/tls.key/clickhouse-certs/tls.key</privateKeyFile>
              <caConfig>/etc/clickhouse-server/secrets.d/ca.crt/clickhouse-certs/ca.crt</caConfig>
              <verificationMode>relaxed</verificationMode>
              <loadDefaultCAFile>true</loadDefaultCAFile>
              <cacheSessions>true</cacheSessions>
              <disableProtocols>sslv2,sslv3</disableProtocols>
              <preferServerCiphers>true</preferServerCiphers>
            </server>
          </openSSL>
        </clickhouse>
      openssl_client.xml: |
        <clickhouse>
          <openSSL>
            <client>
              <loadDefaultCAFile>true</loadDefaultCAFile>
              <caConfig>/etc/clickhouse-server/secrets.d/ca.crt/clickhouse-certs/ca.crt</caConfig>
              <cacheSessions>true</cacheSessions>
              <disableProtocols>sslv2,sslv3</disableProtocols>
              <preferServerCiphers>true</preferServerCiphers>
              <invalidCertificateHandler>
                  <name>RejectCertificateHandler</name>
              </invalidCertificateHandler>
            </client>
          </openSSL>
        </clickhouse>
      tls.crt:
        valueFrom:
          secretKeyRef:
            name: clickhouse-certs
            key: tls.crt
      tls.key:
        valueFrom:
          secretKeyRef:
            name: clickhouse-certs
            key: tls.key
      ca.crt:
        valueFrom:
          secretKeyRef:
            name: clickhouse-certs
            key: ca.crt
    zookeeper:
      nodes:
        - host: chk-keeper-main-ensemble-0-0
          port: 9281
          secure: 'yes'
        - host: chk-keeper-main-ensemble-0-1
          port: 9281
          secure: 'yes'
        - host: chk-keeper-main-ensemble-0-2
          port: 9281
          secure: 'yes'
    clusters:
      - name: 'main-cluster'
        secure: 'yes'
        insecure: 'no'
        secret:
          auto: 'yes'
        templates:
          dataVolumeClaimTemplate: data-volume-template
          logVolumeClaimTemplate: log-volume-template
          podTemplate: clickhouse-pod-template
        layout:
          shardsCount: 3
          replicas: 2
  templates:
    podTemplates:
      - name: clickhouse-pod-template
        metadata:
          annotations:
            prometheus.io/scrape: 'true'
        spec:
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:23.12.5.81
              ports:
                - name: metrics
                  containerPort: 9363
            - name: clickhouse-backup
              image: altinity/clickhouse-backup:stable
              imagePullPolicy: IfNotPresent
              ...
    volumeClaimTemplates:
      - name: data-volume-template
        spec:
          storageClassName: ch-sc
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 25Gi
      - name: log-volume-template
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi

abe-mused avatar Apr 28 '25 01:04 abe-mused

Fyi; we worked around this by bypassing the operator and manually adding the volumes and volumeMounts for the secret in the pod template, but it seems like a bug within the operator itself.

Here is a another occurrence of the issue with someone else asking for help on Slack :)

abe-mused avatar Apr 29 '25 01:04 abe-mused

As we discussed it in the Slack workspace, I add all the necessary infromation regarding to my case:

I’m facing the same error when I’m trying to deploy the 0.25.0 version. I took the operator helm chart and adjusted it by replacing the operator password and namespace to monitor only. In addition to the ClickHouseInstallation custom resource, I added the ClickHouseKeeperInstallation resource. The latter is deployed without any issues, but when I try to setup a Clickhouse cluster, I get the error about “invalid memory address”. I’ve double checked the CH cluster setup configuration and didn’t find any typos or mistakes — probably, I don’t see something simple. Our cluster kubernetes version is v1.31.4. The ClickhouseInstallation configuration is the following one:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
 
metadata:
  name: "clickhouse-cluster"
  namespace: clickhouse-cluster
 
spec:
  reconciling:
    policy: "nowait"
    configMapPropagationTimeout: 90
    cleanup:
      unknownObjects:
        statefulSet: Delete
        pvc: Delete
        configMap: Delete
        service: Delete
      reconcileFailedObjects:
        statefulSet: Retain
        pvc: Retain
        configMap: Retain
        service: Retain
  defaults:
    distributedDDL:
      profile: default
    templates:
      dataVolumeClaimTemplate: default
      serviceTemplate: svc-template
      podTemplate: clickhouse-node1
  configuration:
    settings:
      openSSL/client/loadDefaultCAFile: true
      openSSL/client/caConfig: /etc/clickhouse-server/config.d/ca.crt
      openSSL/client/cacheSessions: true
      openSSL/client/disableProtocols: sslv2,sslv3
      openSSL/client/preferServerCiphers: true
      openSSL/client/invalidCertificateHandler/name: RejectCertificateHandler
      max_server_memory_usage: 0
    zookeeper:
      nodes:
      - host: keeper-clickhouse-keeper.clickhouse-keeper
    users:
      default/access_management: 1
      default/host_regexp: ".*"
      default/networks/ip:
        - "127.0.0.1"
        - "::/0"
    files:
      users.d/remove_database_ordinary.xml: |
        <yandex>
          <profiles>
             <default>
                <default_database_engine remove="1"/>
             </default>
          </profiles>
        </yandex>
      ca.crt: |
		<SOME_CERT_AND_KEY>
    clusters:
      - name: geomotive
        layout:
          shards:
            - name: shard-1
              templates:
                podTemplate: clickhouse-node1
              replicas:
                - name: s-1
                  templates:
                    podTemplate: clickhouse-node1
            - name: shard-2
              templates:
                podTemplate: clickhouse-node2
              replicas:
                - name: s-2
                  templates:
                    podTemplate: clickhouse-node2
            - name: shard-3
              templates:
                podTemplate: clickhouse-node3
              replicas:
                - name: s-3
                  templates:
                    podTemplate: clickhouse-node3
            - name: shard-4
              templates:
                podTemplate: clickhouse-node4
              replicas:
                - name: s-4
                  templates:
                    podTemplate: clickhouse-node4
            - name: shard-5
              templates:
                podTemplate: clickhouse-node5
              replicas:
                - name: s-5
                  templates:
                    podTemplate: clickhouse-node5
 
  templates:
    volumeClaimTemplates:
      - name: default
        spec:
          accessModes:
            - ReadWriteOnce
          storageClassName: local-path
          resources:
            requests:
              storage: 600Gi
    serviceTemplates:
      - name: svc-template
        generateName: clickhouse
        metadata:
          labels:
            custom.label: "v2"
        spec:
          ports:
            - name: http
              port: 8123
            - name: tcp
              port: 9000
    podTemplates:
      - name: clickhouse-node1
        spec:
          nodeSelector:
            kubernetes.io/hostname: "node1"
          containers:
            - name: clickhouse-pod
              image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
              resources:
                requests:
                  cpu: "1000m"
                  memory: "32000Mi"
                limits:
                  cpu: "6000m"
                  memory: "102400Mi"
      - name: clickhouse-node2
        spec:
          nodeSelector:
            kubernetes.io/hostname: "node2"
          containers:
            - name: clickhouse-pod
              image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
              resources:
                requests:
                  cpu: "1000m"
                  memory: "32000Mi"
                limits:
                  cpu: "6000m"
                  memory: "102400Mi"
      - name: clickhouse-node3
        spec:
          nodeSelector:
            kubernetes.io/hostname: "node3"
          containers:
            - name: clickhouse-pod
              image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
              resources:
                requests:
                  cpu: "1000m"
                  memory: "32000Mi"
                limits:
                  cpu: "6000m"
                  memory: "102400Mi"
      - name: clickhouse-node4
        spec:
          nodeSelector:
            kubernetes.io/hostname: "node4"
          containers:
            - name: clickhouse-pod
              image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
              resources:
                requests:
                  cpu: "1000m"
                  memory: "32000Mi"
                limits:
                  cpu: "6000m"
                  memory: "102400Mi"
      - name: clickhouse-node5
        spec:
          nodeSelector:
            kubernetes.io/hostname: "node5"
          containers:
            - name: clickhouse-pod
              image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
              resources:
                requests:
                  cpu: "1000m"
                  memory: "32000Mi"
                limits:
                  cpu: "6000m"
                  memory: "102400Mi"

The error I faced:

/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x1bdc500?, 0x31272b0?})
	/usr/local/go/src/runtime/panic.go:791 +0x132
github.com/altinity/clickhouse-operator/pkg/model.GetConfigMatchSpecs(0xc00128a820)
	/clickhouse-operator/pkg/model/chop_config.go:116 +0xe3
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).buildTemplates(...)
	/clickhouse-operator/pkg/controller/chi/worker-reconciler-chi.go:189
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).build(0xc00006db00, {0x21add58, 0xc00033e370}, 0xc000f9c000?, 0xc000f9c000)
	/clickhouse-operator/pkg/controller/chi/worker-reconciler-chi.go:142 +0x445
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).reconcileCR(0xc00006db00, {0x21add58, 0xc00033e370}, 0x0, 0xc000f9c000)
	/clickhouse-operator/pkg/controller/chi/worker-reconciler-chi.go:65 +0x80f
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).updateCHI(0xc00006db00, {0x21add58, 0xc00033e370}, 0x0, 0xc0009d41a0)
	/clickhouse-operator/pkg/controller/chi/worker.go:347 +0xe85
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).processReconcileCHI(0x100000003?, {0x21add58?, 0xc00033e370?}, 0x0?)
	/clickhouse-operator/pkg/controller/chi/worker-boilerplate.go:72 +0x47
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).processItem(0xc00006db00, {0x21add58, 0xc00033e370}, {0x1bff580, 0xc0007aa1e0})
	/clickhouse-operator/pkg/controller/chi/worker-boilerplate.go:164 +0x47b
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).run(0xc00006db00)
	/clickhouse-operator/pkg/controller/chi/worker-boilerplate.go:53 +0x38f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0004bd7c0, {0x218ec60, 0xc000d3bd70}, 0x1, 0xc0000f8f50)
	/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0004bd7c0, 0x3b9aca00, 0x0, 0x1, 0xc0000f8f50)
	/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
created by github.com/altinity/clickhouse-operator/pkg/controller/chi.(*Controller).Run in goroutine 94
	/clickhouse-operator/pkg/controller/chi/controller.go:526 +0x5a5

he0s avatar Jun 14 '25 16:06 he0s