How to override generated remote_servers
Hi Team,
I'm encountering difficulty overriding the remote_servers configuration generated by the ClickHouse Operator (CHOP) for specific pods.
Context:
I have a setup where some ClickHouse pods manage distributed tables (and need the default CHOP cluster definitions), while others do not and should not have visibility of the full cluster topology via remote_servers. I'm attempting to apply a custom configuration selectively at the cluster level.
Attempted Solution:
Using spec.clusters.files, I added a custom file named conf.d/zzz_remote_servers_override.xml to the specific CHI cluster definition. (Using conf.d as cluster-level files cannot target config.d, and zzz_ prefix to attempt influencing load order).
This file uses <remote_servers replace="1"> and also replace="1" on nested cluster definitions (all-replicated, all-clusters) to try and remove the CHOP-generated entries and define only necessary ones (like a minimal test_cluster pointing to localhost).
Configuration File (conf.d/zzz_remote_servers_override.xml):
<clickhouse>
<remote_servers replace="1">
<all-replicated replace="1">
</all-replicated>
<all-clusters replace="1">
<shard>
<replica>
<host>127.0.0.1</host>
<port>9000</port>
</replica>
</shard>
</all-clusters>
<all-sharded replace="1"> <shard>
<replica>
<host>127.0.0.1</host>
<port>9000</port>
</replica>
</shard>
</all-sharded>
<test_cluster>
<shard>
<replica>
<host>127.0.0.1</host>
<port>9000</port>
</replica>
</shard>
</test_cluster>
</remote_servers>
</clickhouse>
Observed Behavior:
The /var/lib/clickhouse/preprocessed/config.xml shows my conf.d/zzz_remote_servers_override.xml file is being loaded.
The test_cluster defined in my override file is present in the final merged configuration.
However, the CHOP-generated clusters (defined in config.d/chop-generated-remote_servers.xml) are also still present, indicating that replace="1" in my conf.d file did not remove them.
The preprocessed config header lists files used for generation:
<!-- This file was generated automatically.
Do not edit it: it is likely to be discarded and generated again before it's read next time.
Files used to generate this file:
/etc/clickhouse-server/config.xml
/etc/clickhouse-server/conf.d/chop-generated-hostname-ports.xml
/etc/clickhouse-server/conf.d/chop-generated-macros.xml
/etc/clickhouse-server/conf.d/chop-generated-zookeeper.xml
/etc/clickhouse-server/conf.d/zzz-remote_servers_override.xml
/etc/clickhouse-server/config.d/02-clickhouse-01-listen.xml
/etc/clickhouse-server/config.d/02-clickhouse-02-logger.xml
/etc/clickhouse-server/config.d/02-clickhouse-03-system_logs.xml
/etc/clickhouse-server/config.d/chop-generated-remote_servers.xml
/etc/clickhouse-server/config.d/chop-generated-settings.xml -->
Questions:
Does the order of files listed in the preprocessed config header reflect the actual merge order?
Given that ClickHouse typically merges conf.d files over config.d files (as per documentation like this Altinity doc), and my conf.d file is listed, why isn't replace="1" working as expected to remove the definitions from the config.d file?
Is there a recommended or alternative way within CHOP to achieve selective overriding or removal of the auto-generated remote_servers configuration for specific pods or clusters?
Thanks for your guidance!
You can not override standard clusters and it is highly unrecommended. But you can add your own with a different name including hosts you need. It will be merged with others.
Can you elaborate why this is unrecommended?
I don't need a separate cluster defined here, the main reason of doing this is that occasionally we see a lot of network errors from a data node trying to connect to other data nodes outside of its shard and we feel that it's unnecessary for the node to ever communicate to nodes outside of its shard, is there another way to achieve that?
@cw9 , could you elaborate? Node should communicate to other nodes in order to execute distributed queries. You have some control using 'skip_unavailable_shards' , 'distributed_replica_error_half_life', 'distributed_replica_error_cap' and 'distributed_replica_max_ignored_errors' settings.