clickhouse-operator Operator fails to update status due to size

clickhouse-operator E0619 08:13:59.022232       1 controller.go:748] updateCHIObjectStatus():clickhouse/clickhouse-production/55d39e29-fdf9-43b3-9d82-51d7aeefb7dc:got error, all retries are exhausted. err: "rpc error: code = ResourceExhausted desc = trying to send message larger than max (3055012 vs. 2097152)"

We have a fairly large single CHI and are getting this error in the operator now

A large chunk of the status section is the storage.xml from normalizedCompleted, around 420000 bytes of the 2097152 max.

Jun 19 '24 20:06 tanner-bruce

This also seems to affect sections of the code like https://github.com/Altinity/clickhouse-operator/blob/0.24.0/pkg/model/chop_config.go#L151 where the status is used to compare old and new, causing unnecessary restarts.

One potential solution could be allowing configurable storage, and allowing object storage to be used for state storage rather than the k8s status

Jun 19 '24 21:06 tanner-bruce

One workaround can be to bake some of your configuration into clickhouse image itself - reducing status size under max object size allowed by api-server

Jun 20 '24 06:06 ondrej-smola

@tanner-bruce , is it possible to share full CHI? I can see you are using configuration on shard level -- what is a reason for that? Maybe the CHI can be made more compact.

Jun 20 '24 10:06 alex-zaitsev

@alex-zaitsev I'm not sure what shard level configuration. We have some different node types and then we have different disk sizes for some clusters.

@ondrej-smola That is a good idea, we could for sure do the storage xml, but I think that is it

Currently we are looking at splitting up the different clusters into their own CHI and then use cluster discovery to link them to our query pods, but not sure how to migrate to that.

Here is our full CHI, redacted mildly.

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "clickhouse"
spec:
  configuration:
    profiles:
      clickhouse_operator/skip_unavailable_shards: 1
      materialize_ttl_after_modify: 0
      default/skip_unavailable_shards: 1
      readonly/readonly: 1
    settings:
      async_insert_threads: 30
      background_common_pool_size: 24
      background_distributed_schedule_pool_size: 24
      background_move_pool_size: 12
      background_pool_size: 36
      background_schedule_pool_size: 24
      logger/level: debug
      max_table_size_to_drop: 0
      prometheus/asynchronous_metrics: true
      prometheus/endpoint: /metrics
      prometheus/events: true
      prometheus/metrics: true
      prometheus/port: "8888"
      prometheus/status_info: true
    clusters:
      - name: "7"
        layout:
          shardsCount: 14
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-7
      - name: "6"
        layout:
          shardsCount: 14
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-6
      - name: "5"
        layout:
          shardsCount: 14
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-5
          podTemplate: ingest-2-pod-template
      - name: "4"
        layout:
          shardsCount: 8
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-4
      - name: "3"
        layout:
          shardsCount: 8
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-3
      - name: "2"
        layout:
          shardsCount: 2
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-2
      - name: "1"
        layout:
          shardsCount: 1
          replicasCount: 2
        templates:
          dataVolumeClaimTemplate: data-1
      - name: "query"
        templates:
          clusterServiceTemplate: query-service
          dataVolumeClaimTemplate: query-data
          podTemplate: query-pod-template
        layout:
          shardsCount: 4
          replicasCount: 1
    files:
      conf.d/storage.xml: "<clickhouse> <storage_configuration> <disks> <gcs> <type>s3</type> <access_key_id from_env=\"GCS_ACCESS_KEY\" /> <secret_access_key from_env=\"GCS_SECRET_KEY\" /> <endpoint from_env=\"GCS_ENDPOINT\" /> <metadata_path>/var/lib/clickhouse/disks/gcs/</metadata_path> <support_batch_delete>false</support_batch_delete> </gcs> <gcs_6m> <type>s3</type> <access_key_id from_env=\"GCS_ACCESS_KEY\" /> <secret_access_key from_env=\"GCS_SECRET_KEY\" /> <endpoint from_env=\"GCS_ENDPOINT_6M_RETENTION\" /> <metadata_path>/var/lib/clickhouse/disks/gcs_6m/</metadata_path> <support_batch_delete>false</support_batch_delete> </gcs_6m> <gcs_1y> <type>s3</type> <access_key_id from_env=\"GCS_ACCESS_KEY\" /> <secret_access_key from_env=\"GCS_SECRET_KEY\" /> <endpoint from_env=\"GCS_ENDPOINT_1Y_RETENTION\" /> <metadata_path>/var/lib/clickhouse/disks/gcs_1y/</metadata_path> <support_batch_delete>false</support_batch_delete> </gcs_1y> <gcs_cache> <type>cache</type> <disk>gcs</disk> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <enable_filesystem_cache>true</enable_filesystem_cache> <path>/var/lib/clickhouse/disks/gcscache/</path> <enable_filesystem_cache_log>true</enable_filesystem_cache_log> <max_size>10Gi</max_size> </gcs_cache> <gcs_6m_cache> <type>cache</type> <disk>gcs_6m</disk> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <enable_filesystem_cache>true</enable_filesystem_cache> <path>/var/lib/clickhouse/disks/gcscache_6m/</path> <enable_filesystem_cache_log>true</enable_filesystem_cache_log> <max_size>10Gi</max_size> </gcs_6m_cache> <gcs_1y_cache> <type>cache</type> <disk>gcs_1y</disk> <cache_enabled>true</cache_enabled> <data_cache_enabled>true</data_cache_enabled> <enable_filesystem_cache>true</enable_filesystem_cache> <path>/var/lib/clickhouse/disks/gcscache_1y/</path> <enable_filesystem_cache_log>true</enable_filesystem_cache_log> <max_size>95Gi</max_size> </gcs_1y_cache> <ssd> <type>local</type> <path>/var/lib/clickhouse/disks/localssd/</path> </ssd> </disks> <policies> <storage_main> <volumes> <ssd> <disk>ssd</disk> </ssd> <gcs> <disk>gcs_cache</disk> <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert> <prefer_not_to_merge>true</prefer_not_to_merge> </gcs> <gcs_6m> <disk>gcs_6m_cache</disk> <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert> <prefer_not_to_merge>true</prefer_not_to_merge> </gcs_6m> <gcs_1y> <disk>gcs_1y_cache</disk> <perform_ttl_move_on_insert>0</perform_ttl_move_on_insert> <prefer_not_to_merge>true</prefer_not_to_merge> </gcs_1y> </volumes> <move_factor>0.1</move_factor> </storage_main> </policies> </storage_configuration> </clickhouse>"
    zookeeper:
      nodes:
        - host: clickhouse-keeper
          port: 2181
      session_timeout_ms: 30000
      operation_timeout_ms: 10000
      root: /root
      identity: user:password
    users:
      migrations/access_management: 1
      migrations/k8s_secret_password: clickhouse/clickhouse
      migrations/networks/ip: "::/0"
      exporter/k8s_secret_password: clickhouse/clickhouse
      exporter/networks/ip: "::/0"
      grafana/k8s_secret_password: clickhouse/clickhouse
      grafana/networks/ip: "::/0"
      grafana/grants/query:
        - GRANT SELECT ON *.*
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
        - REVOKE ALL PRIVILEGES ON .
      api/k8s_secret_password: clickhouse/clickhouse
      api/networks/ip: "::/0"
  defaults:
    templates:
      logVolumeClaimTemplate: logs
      podTemplate: ingest-pod-template
      serviceTemplate: default-service
      clusterServiceTemplate: cluster-ingest-service
    storageManagement:
      reclaimPolicy: Retain
  templates:
    serviceTemplates:
      - name: default-service
        generateName: clickhouse-{chi}
        metadata:
          annotations:
            networking.gke.io/load-balancer-type: "Internal"
            networking.gke.io/internal-load-balancer-allow-global-access: "true"
        spec:
          ports:
            - name: http
              port: 8123
            - name: tcp
              port: 9000
          type: LoadBalancer
      - name: cluster-ingest-service
        generateName: ingest-{cluster}-{chi}
        metadata:
          annotations:
            networking.gke.io/load-balancer-type: "Internal"
            networking.gke.io/internal-load-balancer-allow-global-access: "true"
        spec:
          ports:
            - name: http
              port: 8123
            - name: tcp
              port: 9000
          type: LoadBalancer
      - name: query-service
        generateName: query-{chi}
        metadata:
          annotations:
            networking.gke.io/load-balancer-type: "Internal"
            networking.gke.io/internal-load-balancer-allow-global-access: "true"
        spec:
          ports:
            - name: http
              port: 8123
            - name: tcp
              port: 9000
          type: LoadBalancer
    podTemplates:
      - name: ingest-pod-template
        metadata:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/schema: "http"
            prometheus.io/port: "8888"
            prometheus.io/path: "/metrics"
        spec:
          tolerations:
          - key: "app"
            operator: "Equal"
            value: "ingest"
            effect: "NoExecute"
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: cloud.google.com/gke-nodepool
                    operator: In
                    values:
                    - ingest
          containers:
          - env:
            - name: SHARD_BUCKET_PATH
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: CLUSTER
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['clickhouse.altinity.com/cluster']
            - name: GCS_ENDPOINT
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_6M_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_1Y_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            envFrom:
            - secretRef:
                name: clickhouse
            image: clickhouse-server:24.5.1.1763
            name: clickhouse
            startupProbe:
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              failureThreshold: 100
              periodSeconds: 9
              timeoutSeconds: 1
            livenessProbe:
              failureThreshold: 100
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 60
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 1
            readinessProbe:
              failureThreshold: 300
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 1
            ports:
              - name: "metrics"
                containerPort: 8888
            resources:
              limits:
                memory: 10Gi
              requests:
                cpu: 1000m
                memory: 10Gi
            volumeMounts:
            - name: cache
              mountPath: /var/lib/clickhouse/disks/gcscache
            - name: cache-6m
              mountPath: /var/lib/clickhouse/disks/gcscache_6m
            - name: cache-1y
              mountPath: /var/lib/clickhouse/disks/gcscache_1y
      - name: ingest-2-pod-template
        metadata:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/schema: "http"
            prometheus.io/port: "8888"
            prometheus.io/path: "/metrics"
        spec:
          tolerations:
          - key: "app"
            operator: "Equal"
            value: "ingest-2"
            effect: "NoExecute"
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: cloud.google.com/gke-nodepool
                    operator: In
                    values:
                    - ingest-2
          containers:
          - env:
            - name: SHARD_BUCKET_PATH
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: CLUSTER
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['clickhouse.altinity.com/cluster']
            - name: GCS_ENDPOINT
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_6M_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_1Y_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            envFrom:
            - secretRef:
                name: clickhouse
            image: clickhouse-server:24.5.1.1763
            name: clickhouse
            startupProbe:
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              failureThreshold: 100
              periodSeconds: 9
              timeoutSeconds: 1
            livenessProbe:
              failureThreshold: 100
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 60
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 1
            readinessProbe:
              failureThreshold: 300
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 30
              successThreshold: 1
              timeoutSeconds: 1
            ports:
              - name: "metrics"
                containerPort: 8888
            resources:
              limits:
                memory: 10Gi
              requests:
                cpu: 1000m
                memory: 10Gi
            volumeMounts:
            - name: cache
              mountPath: /var/lib/clickhouse/disks/gcscache
            - name: cache-6m
              mountPath: /var/lib/clickhouse/disks/gcscache_6m
            - name: cache-1y
              mountPath: /var/lib/clickhouse/disks/gcscache_1y
      - name: query-pod-template
        metadata:
          annotations:
            prometheus.io/scrape: "true"
            prometheus.io/schema: "http"
            prometheus.io/port: "8888"
            prometheus.io/path: "/metrics"
        spec:
          tolerations:
          - key: "app"
            operator: "Equal"
            value: "ingest"
            effect: "NoExecute"
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: cloud.google.com/gke-nodepool
                    operator: In
                    values:
                    - ingest
          containers:
          - env:
            - name: SHARD_BUCKET_PATH
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: CLUSTER
              valueFrom:
                fieldRef:
                  fieldPath: metadata.labels['clickhouse.altinity.com/cluster']
            - name: GCS_ENDPOINT
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_6M_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            - name: GCS_ENDPOINT_1Y_RETENTION
              value: bucket/$(POD_NAMESPACE)/$(CLUSTER)/$(SHARD_BUCKET_PATH)
            envFrom:
            - secretRef:
                name: clickhouse
            image: clickhouse-server:24.5.1.1763
            name: clickhouse
            startupProbe:
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              failureThreshold: 40
              periodSeconds: 3
              timeoutSeconds: 1
            livenessProbe:
              failureThreshold: 10
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 60
              periodSeconds: 3
              successThreshold: 1
              timeoutSeconds: 1
            readinessProbe:
              failureThreshold: 3
              httpGet:
                path: /ping
                port: http
                scheme: HTTP
              initialDelaySeconds: 10
              periodSeconds: 3
              successThreshold: 1
              timeoutSeconds: 1
            ports:
              - name: "metrics"
                containerPort: 8888
            resources:
              limits:
                memory: 10Gi
              requests:
                cpu: 1000m
                memory: 10Gi
    volumeClaimTemplates:
    - name: data-7
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd    
    - name: data-6
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: data-5
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
    - name: data-4
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
          storageClassName: ssd
    - name: data-3
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
          storageClassName: ssd
    - name: data-2
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
          storageClassName: ssd
    - name: data-1
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
          storageClassName: ssd
    - name: query-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: cache
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: cache-6m
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: cache-1y
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: ssd
    - name: logs
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi

Jun 20 '24 14:06 tanner-bruce

I would definitely consider moving to multiple CHI objects and make shared configuration generated by some (git)ops tool . If I understand it correctly it started after adding clusters 5, 6 and 7?

Jun 21 '24 08:06 ondrej-smola

@tanner-bruce , did it help after splitting clusters to multiple CHIs?

Jul 18 '24 10:07 alex-zaitsev

@alex-zaitsev We ran in to this on a single cluster now (after splitting). Trying to embed the storage.xml from the {files: { conf.d/storage.xml: .. } via configmap now and I'm not having much luck to mount it in the conf.d location because multiple configmaps cannot be mounted to the same folder without using a projected volume. Or I would need to use an init container. Do you have any thoughts here?

Nov 13 '24 16:11 tanner-bruce

@tanner-bruce did you try

spec:
  configuration:
    files:
    - conf.d/storage.xml: |
      <content>

?

Nov 13 '24 17:11 Slach

@Slach you can see my CHI here where that is what is already in it.

This gets replicated for every stateful set inside the status causing huge amount of character usage

Nov 13 '24 20:11 tanner-bruce

it is in the normalizedCompleted

    normalizedCompleted:
      apiVersion: clickhouse.altinity.com/v1
      kind: ClickHouseInstallation
      metadata:
        creationTimestamp: "2024-01-25T16:15:26Z"
        finalizers:
        - finalizer.clickhouseinstallation.altinity.com
        generation: 23
        name: tracing
        namespace: clickhouse-production
        resourceVersion: "914532393"
        uid: 1a5d08aa-ca86-455e-a1fc-145119b78ad9
      spec:
        configuration:
          clusters:
          - files:
              conf.d/storage.xml: ....

Nov 13 '24 20:11 tanner-bruce

We plan to move normalized from status to a separate configmap.

Dec 02 '24 11:12 alex-zaitsev

@tanner-bruce, this is fixed in 0.24.3. normalizedCHI is moved to a separate configmap instead of status.

Dec 27 '24 08:12 alex-zaitsev

Fixed in https://github.com/Altinity/clickhouse-operator/pull/1623

Jan 27 '25 10:01 alex-zaitsev

@alex-zaitsev we are now hitting configmap max size

W0417 16:33:16.064626       1 cr.go:121] statusUpdateRetry():clickhouse-core/core/9a326975-286c-4306-96e4-abab9a8391a8:got error, will retry. err: "ConfigMap \"chi-storage-core\" is invalid: []: Too long: must have at most 1048576 bytes"
W0417 16:33:16.465528       1 cr.go:121] statusUpdateRetry():clickhouse-core/core/9a326975-286c-4306-96e4-abab9a8391a8:got error, will retry. err: "ConfigMap \"chi-storage-core\" is invalid: []: Too long: must have at most 1048576 bytes"

Apr 17 '25 17:04 tanner-bruce