Connection issue when there are several ClickHouse servers
As I reported in https://github.com/trinodb/trino/pull/10675#issuecomment-1017255928, ClickHouse JDBC driver seems connecting to other servers that is not defined in the connection url. Can we get help to debug and fix the issue?
Sure, will run tests from ebi/clickhouse-rename-schema branch and see if I can find root cause in these days. The first thing pop up in my head is testcontainer. Have you tried GenericContainer instead of ClickHouseContainer?
By the way, is the server connected still 20.8.19.4? It does not match any of the declared images.
Sure, I will try GenericContainer too and I'm trying randomized http port in ClickHouse server to separate issues.
We already removed support for 20.8, so the declared images are the right version at this time.
This issue happened even after randomized ClickHouse http port, so I think it's not testcontainers' issue.
I agree. I'm thinking to either to add connection validation(since it's legacy driver which has issue dealing with stale connection), or merge your change into trino/trino#10801 and run test against new driver. Will try it out tonight.
Validating connection before execution did not help, so I updated connection string for legacy driver by adding validateAfterInactivityMillis=100(Apache HttpClient uses 2000ms by default, see details at #760) to further reduce the possibility of running into failed to respond issue. The new driver on the other hand does not have the connection issue, so the test went well for 20.7+.
I'd suggest you guys to merge trinodb/trino#10801 first and mark tests against 20.3 as flaky.
Update:
Again, 1764169145 (50/50) was just lucky. 1764444585 (49/50) shows validateAfterInactivityMillis didn't help much too.
Again, 1764169145 (50/50) was just lucky. 1764444585 (49/50) shows validateAfterInactivityMillis didn't help much too.
Do you mean the flaky issue still exists even after upgrading the driver?
Do you mean the flaky issue still exists even after upgrading the driver?
No, the issue only exists when you test ClickHouse 20.3 using legacy driver. Upgrading the driver only helps for 20.7+.
Thanks for your help! Upgrading the driver exactly resolved the flaky issue in new ClickHouse versions. As you already mentioned, the flakiness still exists in 20.3 (Altinity build).
You're welcome, and I'm glad that I can help :)
As to the flakiness in 20.3, I'm sorry that I have to leave it as is, mainly because supporting 20.3 in the new driver requires more changes in code base(not only clickhouse-jdbc but also clickhouse-http-client 🤦) than I thought, making it not fit into a patch release. As I'm not working on this project in full time, I'd rather save the effort for the upcoming v0.3.3 for TCP/Native protocol support.
Anyway, we can revisit this in June or so by completely removing the legacy driver and 20.3 test from trino.