clickhouse-operator icon indicating copy to clipboard operation
clickhouse-operator copied to clipboard

DDLWorker Fails for ON CLUSTER Statements in Multiple ClickHouse Versions (Including 25.10.2.65 and 25.11.1)

Open EinavDanielDX opened this issue 1 month ago • 2 comments

DDLWorker Fails for ON CLUSTER Statements Across Multiple ClickHouse Versions (Including 25.10.2.65 and 25.11.1)

Summary

We are encountering consistent DDLWorker failures when executing any ON CLUSTER DDL statements across multiple ClickHouse versions, all under the Altinity Operator v0.24.4.

The issue is not limited to ClickHouse v25.8.10.7-lts (as described in upstream ClickHouse issue #89693) and also occurs in:

  • ClickHouse 25.10.2.65
  • ClickHouse 25.11.1

This suggests a broader regression or compatibility issue affecting distributed DDL execution under operator-managed clusters.

Upstream reference:
https://github.com/ClickHouse/ClickHouse/issues/89693


Problem Description

Executing any DDL statement using ON CLUSTER consistently fails in DDLWorker. For example:

CREATE TABLE ... ON CLUSTER '{cluster}';

Errors observed:

Code: 999. DDLWorker failed to execute query ...

This prevents schema propagation and blocks automated migrations in multi-shard/multi-replica environments.

Additional Notes

  • The issue occurs consistently across several ClickHouse versions, not just the one affected upstream.
  • Possibly related to upstream issue #89693, but reproducible even on newer releases.
  • Requesting guidance on: -- Whether the operator requires patches/workarounds, -- Whether certain ClickHouse versions should be considered incompatible, -- Or if the operator should enforce version restrictions until resolved.

EinavDanielDX avatar Dec 08 '25 16:12 EinavDanielDX

Since this is a ClickHouse bug, we have to wait until is is fixed. Maybe this one can do it better, but we have not tested it yet: https://github.com/Altinity/clickhouse-operator/pull/1833

alex-zaitsev avatar Dec 10 '25 14:12 alex-zaitsev

Confirmed that DDL works fine with clickhouse/clickhouse-server:25.8.9.20 but does not work with newer versions

alex-zaitsev avatar Dec 10 '25 14:12 alex-zaitsev

I think the issue may be specifically with clickhouse-keeper (and not with clickhouse-server).

I'm running clickhouse-server:25.8.10.7 and am able to execute CREATE TABLE ... ON CLUSTER '{cluster}' ... DLL statements without any problems.

janeklb avatar Dec 12 '25 23:12 janeklb

It might be related to the fact that we are running the cluster on EKS, as suggested here

EinavDanielDX avatar Dec 14 '25 12:12 EinavDanielDX

It might be related to the fact that we are running the cluster on EKS, as suggested here

FWIW I'm also running the cluster on EKS (Kubernetes 1.32 + CoreDNS 1.11.4 addon) and as mentioned above I'm not facing this issue. Happy to share more details about the setup if it would help.

janeklb avatar Dec 14 '25 14:12 janeklb