DDLWorker Fails for ON CLUSTER Statements in Multiple ClickHouse Versions (Including 25.10.2.65 and 25.11.1)
DDLWorker Fails for ON CLUSTER Statements Across Multiple ClickHouse Versions (Including 25.10.2.65 and 25.11.1)
Summary
We are encountering consistent DDLWorker failures when executing any ON CLUSTER DDL statements across multiple ClickHouse versions, all under the Altinity Operator v0.24.4.
The issue is not limited to ClickHouse v25.8.10.7-lts (as described in upstream ClickHouse issue #89693) and also occurs in:
- ClickHouse 25.10.2.65
- ClickHouse 25.11.1
This suggests a broader regression or compatibility issue affecting distributed DDL execution under operator-managed clusters.
Upstream reference:
https://github.com/ClickHouse/ClickHouse/issues/89693
Problem Description
Executing any DDL statement using ON CLUSTER consistently fails in DDLWorker. For example:
CREATE TABLE ... ON CLUSTER '{cluster}';
Errors observed:
Code: 999. DDLWorker failed to execute query ...
This prevents schema propagation and blocks automated migrations in multi-shard/multi-replica environments.
Additional Notes
- The issue occurs consistently across several ClickHouse versions, not just the one affected upstream.
- Possibly related to upstream issue #89693, but reproducible even on newer releases.
- Requesting guidance on: -- Whether the operator requires patches/workarounds, -- Whether certain ClickHouse versions should be considered incompatible, -- Or if the operator should enforce version restrictions until resolved.
Since this is a ClickHouse bug, we have to wait until is is fixed. Maybe this one can do it better, but we have not tested it yet: https://github.com/Altinity/clickhouse-operator/pull/1833
Confirmed that DDL works fine with clickhouse/clickhouse-server:25.8.9.20 but does not work with newer versions
I think the issue may be specifically with clickhouse-keeper (and not with clickhouse-server).
I'm running clickhouse-server:25.8.10.7 and am able to execute CREATE TABLE ... ON CLUSTER '{cluster}' ... DLL statements without any problems.
It might be related to the fact that we are running the cluster on EKS, as suggested here
It might be related to the fact that we are running the cluster on EKS, as suggested here
FWIW I'm also running the cluster on EKS (Kubernetes 1.32 + CoreDNS 1.11.4 addon) and as mentioned above I'm not facing this issue. Happy to share more details about the setup if it would help.