Operator fails to create ConfigMap `chi-storage-core`
Same as https://github.com/Altinity/clickhouse-operator/issues/1444 but I cannot re-open. v0.24.5
W0515 19:32:15.236346 1 cr.go:121] statusUpdateRetry():clickhouse-core/core/13fbe608-fb22-49ec-93b3-d0f3786c09e9:got error, will retry. err: "ConfigMap \"chi-storage-core\" is invalid: []: Too long: must have at most 1048576 bytes
It has never successfully created it so I cannot see what is in it
@tanner-bruce , how may nodes do you have?
@tanner-bruce , how may nodes do you have?
62, there are xml configs and TTL for system tables specified in it.
Do you have settings defined separately for every shard/replica?
I am asking this because we have clusters with 200+ nodes, and those fit chi-storage configmap. So there is something non-standard in your configuration that blows it too much.
Do you have settings defined separately for every shard/replica?
We do not have settings defined per shard/replica. They are defined per cluster and we run two clusters that are defined in the same CHI.
We do have a lot of inline xml configurations that are also duplicated per cluster, so for example the table TTLs mentioned above do make up a big chunk:
- files:
log_ttls.xml: |-
<clickhouse>
<query_log replace="1">
<database>system</database>
<table>query_log</table>
<engine>ENGINE = MergeTree PARTITION BY (event_date)
ORDER BY (event_time)
TTL event_date + INTERVAL 30 DAY DELETE
</engine>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
<text_log replace="1">
<database>system</database>
<table>text_log</table>
<engine>ENGINE = MergeTree PARTITION BY (event_date)
ORDER BY (event_time)
TTL event_date + INTERVAL 30 DAY DELETE
</engine>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</text_log>
<metric_log replace="1">
<database>system</database>
<table>metric_log</table>
<engine>ENGINE = MergeTree PARTITION BY (event_date)
ORDER BY (event_time)
TTL event_date + INTERVAL 30 DAY DELETE
</engine>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</metric_log>
<query_metric_log replace="1">
<database>system</database>
<table>query_metric_log</table>
<engine>ENGINE = MergeTree PARTITION BY (event_date)
ORDER BY (event_time)
TTL event_date + INTERVAL 30 DAY DELETE
</engine>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_metric_log>
<processors_profile_log replace="1">
<database>system</database>
<table>processors_profile_log</table>
<engine>ENGINE = MergeTree PARTITION BY (event_date)
ORDER BY (event_time)
TTL event_date + INTERVAL 30 DAY DELETE
</engine>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</processors_profile_log>
<opentelemetry_span_log replace="1">
<database>system</database>
<table>opentelemetry_span_log</table>
<engine>ENGINE = MergeTree PARTITION BY (event_date)
ORDER BY (event_time)
TTL event_date + INTERVAL 1 DAY DELETE
</engine>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</opentelemetry_span_log>
<error_log replace="1">
<database>system</database>
<table>error_log</table>
<engine>ENGINE = MergeTree PARTITION BY (event_date)
ORDER BY (event_time)
TTL event_date + INTERVAL 30 DAY DELETE
</engine>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</error_log>
<crash_log replace="1">
<database>system</database>
<table>crash_log</table>
<engine>ENGINE = MergeTree PARTITION BY (event_date)
ORDER BY (event_time)
TTL event_date + INTERVAL 30 DAY DELETE
</engine>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</crash_log>
</clickhouse>
And we have even more XML like this for storage, certificates and mtls user configuration.
Hi @alex-zaitsev just wanted to check in and see if you had any thoughts. I'm now looking at this for @tanner-bruce and @mklocke-shopify
My only suggestion for now is to reduce configuration a bit, for example, query_log is specified the same way in default configuration already, see https://github.com/Altinity/clickhouse-operator/blob/master/config/chi/config.d/01-clickhouse-03-query_log.xml
Also, I would not care about crash_log and error_log at all, since those are very small.
Going forward, we consider to run a small db instance instead of storing state in a configmap.