Operator Does not Render the Correct Keeper Statefulset Volumes for Secrets
I have this ClickHouse Keeper installation [1], I'm seeing an issue with keeper Statefulsets failing to create keeper pods because the pod template has some volumes of type secret with no name (and those volumes are marked with optional: false). It surfaces as this event [2] on the Statefulsets, with no pods created.
I'm adding those secrets in the .spec.configuration.files of my ClickHouseKeeperInstallation just like this example, but when I look at the -o yaml of the keeper Statefulsets that's generated by the operator I see this [3].
Am I doing something wrong?
Notes:
- I'm using operator version
0.24.5 - This is hosted on EKS with K8s version 1.31
- The secrets exist and are being used by the clickhouse pods in this ClickHouseInstallation [4] with no issues.
[1] ClickHouseKeeperInstallation manifest:
apiVersion: 'clickhouse-keeper.altinity.com/v1'
kind: 'ClickHouseKeeperInstallation'
metadata:
name: 'keeper'
namespace: 'clickhouse'
spec:
configuration:
settings:
prometheus/endpoint: '/metrics'
prometheus/port: '9363'
prometheus/metrics: 'true'
prometheus/events: 'true'
prometheus/asynchronous_metrics: 'true'
prometheus/errors: 'true'
prometheus/status_info: 'true'
files:
keeper_config.xml: |
<clickhouse>
<keeper_server>
<tcp_port_secure>9281</tcp_port_secure>
<raft_configuration>
<secure>true</secure>
</raft_configuration>
</keeper_server>
</clickhouse>
openssl_server.xml: |
<clickhouse>
<openSSL>
<server>
<certificateFile>/etc/clickhouse-server/secrets.d/tls.crt/clickhouse-certs/tls.crt</certificateFile>
<privateKeyFile>/etc/clickhouse-server/secrets.d/tls.key/clickhouse-certs/tls.key</privateKeyFile>
<caConfig>/etc/clickhouse-server/secrets.d/ca.crt/clickhouse-certs/ca.crt</caConfig>
<verificationMode>relaxed</verificationMode>
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<dhParamsFile remove="remove">/etc/clickhouse-keeper/dhparam.pem</dhParamsFile>
</server>
</openSSL>
</clickhouse>
openssl_client.xml: |
<clickhouse>
<openSSL>
<client>
<loadDefaultCAFile>true</loadDefaultCAFile>
<caConfig>/etc/clickhouse-server/secrets.d/ca.crt/clickhouse-certs/ca.crt</caConfig>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<invalidCertificateHandler>
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>
</clickhouse>
tls.crt:
valueFrom:
secretKeyRef:
name: clickhouse-certs
key: tls.crt
tls.key:
valueFrom:
secretKeyRef:
name: clickhouse-certs
key: tls.key
ca.crt:
valueFrom:
secretKeyRef:
name: clickhouse-certs
key: ca.crt
clusters:
- name: 'main-ensemble'
templates:
dataVolumeClaimTemplate: data-volume-template
logVolumeClaimTemplate: log-volume-template
layout:
replicas:
- templates:
podTemplate: clickhouse-keeper-pod-template
- templates:
podTemplate: clickhouse-keeper-pod-template
- templates:
podTemplate: clickhouse-keeper-pod-template
templates:
podTemplates:
- name: clickhouse-keeper-pod-template
metadata:
annotations:
prometheus.io/scrape: 'true'
spec:
containers:
- name: clickhouse-keeper
image: clickhouse/clickhouse-keeper:23.12.5.81
ports:
- name: metrics
containerPort: 9363
volumeClaimTemplates:
- name: data-volume-template
spec:
storageClassName: keeper-data
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 25Gi
- name: log-volume-template
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
[2] Event of one of the keeper statefulsets that failed to create a pod:
Events: │
│ Type Reason Age From Message │
│ ---- ------ ---- ---- ------- │
│ Warning FailedCreate 88s (x19 over 23m) statefulset-controller create Pod chk-keeper-main-ensemble-0-0-0 in StatefulSet chk-keeper-main-ensemble-0-0 failed error: Pod "chk-keeper-main-ensemble-0-0-0" is invalid: [spec.volumes │
│ 5[].secret.secretName: Required value, spec.volumes[5].secret.items[0].key: Required value, spec.volumes[5].secret.items[0].path: Required value, spec.volumes[6].secret.secretName: Required value, spec.volumes[6].secret.items[0].ke │
│ y: Required value, spec.volumes[6].secret.items[0].path: Required value, spec.volumes[7].secret.secretName: Required value, spec.volumes[7].secret.items[0].key: Required value, spec.volumes[7].secret.items[0].path: Required value, │
│ spec.containers[0].volumeMounts[3].name: Not found: "tlscrt", spec.containers[0].volumeMounts[4].name: Not found: "tlskey", spec.containers[0].volumeMounts[5].name: Not found: "cacrt"]
[3] the volumes section of the statefulsets having no key or path even tho they were specified in the ClickHouseKeeperInstallation:
spec.template.spec.volumes:
...
- name: tlskey
secret:
defaultMode: 420
items:
- key: ""
path: ""
- name: cacrt
secret:
defaultMode: 420
items:
- key: ""
path: ""
- name: tlscrt
secret:
defaultMode: 420
items:
- key: ""
path: ""
[4] The ClickHouseInstallation manifest that is successfully using the same secrets:
apiVersion: 'clickhouse.altinity.com/v1'
kind: 'ClickHouseInstallation'
metadata:
name: 'ch'
namespace: clickhouse
spec:
configuration:
settings:
prometheus/endpoint: '/metrics'
prometheus/port: '9363'
prometheus/metrics: 'true'
prometheus/events: 'true'
prometheus/asynchronous_metrics: 'true'
prometheus/errors: 'true'
prometheus/status_info: 'true'
...
files:
openssl_server.xml: |
<clickhouse>
<openSSL>
<server>
<certificateFile>/etc/clickhouse-server/secrets.d/tls.crt/clickhouse-certs/tls.crt</certificateFile>
<privateKeyFile>/etc/clickhouse-server/secrets.d/tls.key/clickhouse-certs/tls.key</privateKeyFile>
<caConfig>/etc/clickhouse-server/secrets.d/ca.crt/clickhouse-certs/ca.crt</caConfig>
<verificationMode>relaxed</verificationMode>
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
</server>
</openSSL>
</clickhouse>
openssl_client.xml: |
<clickhouse>
<openSSL>
<client>
<loadDefaultCAFile>true</loadDefaultCAFile>
<caConfig>/etc/clickhouse-server/secrets.d/ca.crt/clickhouse-certs/ca.crt</caConfig>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<invalidCertificateHandler>
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>
</clickhouse>
tls.crt:
valueFrom:
secretKeyRef:
name: clickhouse-certs
key: tls.crt
tls.key:
valueFrom:
secretKeyRef:
name: clickhouse-certs
key: tls.key
ca.crt:
valueFrom:
secretKeyRef:
name: clickhouse-certs
key: ca.crt
zookeeper:
nodes:
- host: chk-keeper-main-ensemble-0-0
port: 9281
secure: 'yes'
- host: chk-keeper-main-ensemble-0-1
port: 9281
secure: 'yes'
- host: chk-keeper-main-ensemble-0-2
port: 9281
secure: 'yes'
clusters:
- name: 'main-cluster'
secure: 'yes'
insecure: 'no'
secret:
auto: 'yes'
templates:
dataVolumeClaimTemplate: data-volume-template
logVolumeClaimTemplate: log-volume-template
podTemplate: clickhouse-pod-template
layout:
shardsCount: 3
replicas: 2
templates:
podTemplates:
- name: clickhouse-pod-template
metadata:
annotations:
prometheus.io/scrape: 'true'
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:23.12.5.81
ports:
- name: metrics
containerPort: 9363
- name: clickhouse-backup
image: altinity/clickhouse-backup:stable
imagePullPolicy: IfNotPresent
...
volumeClaimTemplates:
- name: data-volume-template
spec:
storageClassName: ch-sc
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 25Gi
- name: log-volume-template
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
Fyi; we worked around this by bypassing the operator and manually adding the volumes and volumeMounts for the secret in the pod template, but it seems like a bug within the operator itself.
Here is a another occurrence of the issue with someone else asking for help on Slack :)
As we discussed it in the Slack workspace, I add all the necessary infromation regarding to my case:
I’m facing the same error when I’m trying to deploy the 0.25.0 version. I took the operator helm chart and adjusted it by replacing the operator password and namespace to monitor only. In addition to the ClickHouseInstallation custom resource, I added the ClickHouseKeeperInstallation resource. The latter is deployed without any issues, but when I try to setup a Clickhouse cluster, I get the error about “invalid memory address”. I’ve double checked the CH cluster setup configuration and didn’t find any typos or mistakes — probably, I don’t see something simple. Our cluster kubernetes version is v1.31.4. The ClickhouseInstallation configuration is the following one:
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
name: "clickhouse-cluster"
namespace: clickhouse-cluster
spec:
reconciling:
policy: "nowait"
configMapPropagationTimeout: 90
cleanup:
unknownObjects:
statefulSet: Delete
pvc: Delete
configMap: Delete
service: Delete
reconcileFailedObjects:
statefulSet: Retain
pvc: Retain
configMap: Retain
service: Retain
defaults:
distributedDDL:
profile: default
templates:
dataVolumeClaimTemplate: default
serviceTemplate: svc-template
podTemplate: clickhouse-node1
configuration:
settings:
openSSL/client/loadDefaultCAFile: true
openSSL/client/caConfig: /etc/clickhouse-server/config.d/ca.crt
openSSL/client/cacheSessions: true
openSSL/client/disableProtocols: sslv2,sslv3
openSSL/client/preferServerCiphers: true
openSSL/client/invalidCertificateHandler/name: RejectCertificateHandler
max_server_memory_usage: 0
zookeeper:
nodes:
- host: keeper-clickhouse-keeper.clickhouse-keeper
users:
default/access_management: 1
default/host_regexp: ".*"
default/networks/ip:
- "127.0.0.1"
- "::/0"
files:
users.d/remove_database_ordinary.xml: |
<yandex>
<profiles>
<default>
<default_database_engine remove="1"/>
</default>
</profiles>
</yandex>
ca.crt: |
<SOME_CERT_AND_KEY>
clusters:
- name: geomotive
layout:
shards:
- name: shard-1
templates:
podTemplate: clickhouse-node1
replicas:
- name: s-1
templates:
podTemplate: clickhouse-node1
- name: shard-2
templates:
podTemplate: clickhouse-node2
replicas:
- name: s-2
templates:
podTemplate: clickhouse-node2
- name: shard-3
templates:
podTemplate: clickhouse-node3
replicas:
- name: s-3
templates:
podTemplate: clickhouse-node3
- name: shard-4
templates:
podTemplate: clickhouse-node4
replicas:
- name: s-4
templates:
podTemplate: clickhouse-node4
- name: shard-5
templates:
podTemplate: clickhouse-node5
replicas:
- name: s-5
templates:
podTemplate: clickhouse-node5
templates:
volumeClaimTemplates:
- name: default
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 600Gi
serviceTemplates:
- name: svc-template
generateName: clickhouse
metadata:
labels:
custom.label: "v2"
spec:
ports:
- name: http
port: 8123
- name: tcp
port: 9000
podTemplates:
- name: clickhouse-node1
spec:
nodeSelector:
kubernetes.io/hostname: "node1"
containers:
- name: clickhouse-pod
image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
resources:
requests:
cpu: "1000m"
memory: "32000Mi"
limits:
cpu: "6000m"
memory: "102400Mi"
- name: clickhouse-node2
spec:
nodeSelector:
kubernetes.io/hostname: "node2"
containers:
- name: clickhouse-pod
image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
resources:
requests:
cpu: "1000m"
memory: "32000Mi"
limits:
cpu: "6000m"
memory: "102400Mi"
- name: clickhouse-node3
spec:
nodeSelector:
kubernetes.io/hostname: "node3"
containers:
- name: clickhouse-pod
image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
resources:
requests:
cpu: "1000m"
memory: "32000Mi"
limits:
cpu: "6000m"
memory: "102400Mi"
- name: clickhouse-node4
spec:
nodeSelector:
kubernetes.io/hostname: "node4"
containers:
- name: clickhouse-pod
image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
resources:
requests:
cpu: "1000m"
memory: "32000Mi"
limits:
cpu: "6000m"
memory: "102400Mi"
- name: clickhouse-node5
spec:
nodeSelector:
kubernetes.io/hostname: "node5"
containers:
- name: clickhouse-pod
image: "clickhouse/clickhouse-server:25.5.2.47-alpine"
resources:
requests:
cpu: "1000m"
memory: "32000Mi"
limits:
cpu: "6000m"
memory: "102400Mi"
The error I faced:
/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x1bdc500?, 0x31272b0?})
/usr/local/go/src/runtime/panic.go:791 +0x132
github.com/altinity/clickhouse-operator/pkg/model.GetConfigMatchSpecs(0xc00128a820)
/clickhouse-operator/pkg/model/chop_config.go:116 +0xe3
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).buildTemplates(...)
/clickhouse-operator/pkg/controller/chi/worker-reconciler-chi.go:189
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).build(0xc00006db00, {0x21add58, 0xc00033e370}, 0xc000f9c000?, 0xc000f9c000)
/clickhouse-operator/pkg/controller/chi/worker-reconciler-chi.go:142 +0x445
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).reconcileCR(0xc00006db00, {0x21add58, 0xc00033e370}, 0x0, 0xc000f9c000)
/clickhouse-operator/pkg/controller/chi/worker-reconciler-chi.go:65 +0x80f
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).updateCHI(0xc00006db00, {0x21add58, 0xc00033e370}, 0x0, 0xc0009d41a0)
/clickhouse-operator/pkg/controller/chi/worker.go:347 +0xe85
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).processReconcileCHI(0x100000003?, {0x21add58?, 0xc00033e370?}, 0x0?)
/clickhouse-operator/pkg/controller/chi/worker-boilerplate.go:72 +0x47
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).processItem(0xc00006db00, {0x21add58, 0xc00033e370}, {0x1bff580, 0xc0007aa1e0})
/clickhouse-operator/pkg/controller/chi/worker-boilerplate.go:164 +0x47b
github.com/altinity/clickhouse-operator/pkg/controller/chi.(*worker).run(0xc00006db00)
/clickhouse-operator/pkg/controller/chi/worker-boilerplate.go:53 +0x38f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0004bd7c0, {0x218ec60, 0xc000d3bd70}, 0x1, 0xc0000f8f50)
/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0004bd7c0, 0x3b9aca00, 0x0, 0x1, 0xc0000f8f50)
/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
/clickhouse-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
created by github.com/altinity/clickhouse-operator/pkg/controller/chi.(*Controller).Run in goroutine 94
/clickhouse-operator/pkg/controller/chi/controller.go:526 +0x5a5