clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

Connection issue when there are several ClickHouse servers

Open ebyhr opened this issue 4 years ago • 9 comments

As I reported in https://github.com/trinodb/trino/pull/10675#issuecomment-1017255928, ClickHouse JDBC driver seems connecting to other servers that is not defined in the connection url. Can we get help to debug and fix the issue?

ebyhr avatar Jan 26 '22 02:01 ebyhr

Sure, will run tests from ebi/clickhouse-rename-schema branch and see if I can find root cause in these days. The first thing pop up in my head is testcontainer. Have you tried GenericContainer instead of ClickHouseContainer?

By the way, is the server connected still 20.8.19.4? It does not match any of the declared images.

zhicwu avatar Jan 26 '22 02:01 zhicwu

Sure, I will try GenericContainer too and I'm trying randomized http port in ClickHouse server to separate issues.

We already removed support for 20.8, so the declared images are the right version at this time.

ebyhr avatar Jan 26 '22 03:01 ebyhr

This issue happened even after randomized ClickHouse http port, so I think it's not testcontainers' issue.

ebyhr avatar Jan 27 '22 02:01 ebyhr

I agree. I'm thinking to either to add connection validation(since it's legacy driver which has issue dealing with stale connection), or merge your change into trino/trino#10801 and run test against new driver. Will try it out tonight.

zhicwu avatar Jan 27 '22 07:01 zhicwu

Validating connection before execution did not help, so I updated connection string for legacy driver by adding validateAfterInactivityMillis=100(Apache HttpClient uses 2000ms by default, see details at #760) to further reduce the possibility of running into failed to respond issue. The new driver on the other hand does not have the connection issue, so the test went well for 20.7+.

I'd suggest you guys to merge trinodb/trino#10801 first and mark tests against 20.3 as flaky.

Update: Again, 1764169145 (50/50) was just lucky. 1764444585 (49/50) shows validateAfterInactivityMillis didn't help much too.

zhicwu avatar Jan 29 '22 01:01 zhicwu

Again, 1764169145 (50/50) was just lucky. 1764444585 (49/50) shows validateAfterInactivityMillis didn't help much too.

Do you mean the flaky issue still exists even after upgrading the driver?

ebyhr avatar Jan 31 '22 03:01 ebyhr

Do you mean the flaky issue still exists even after upgrading the driver?

No, the issue only exists when you test ClickHouse 20.3 using legacy driver. Upgrading the driver only helps for 20.7+.

zhicwu avatar Jan 31 '22 07:01 zhicwu

Thanks for your help! Upgrading the driver exactly resolved the flaky issue in new ClickHouse versions. As you already mentioned, the flakiness still exists in 20.3 (Altinity build).

ebyhr avatar Feb 03 '22 00:02 ebyhr

You're welcome, and I'm glad that I can help :)

As to the flakiness in 20.3, I'm sorry that I have to leave it as is, mainly because supporting 20.3 in the new driver requires more changes in code base(not only clickhouse-jdbc but also clickhouse-http-client 🤦) than I thought, making it not fit into a patch release. As I'm not working on this project in full time, I'd rather save the effort for the upcoming v0.3.3 for TCP/Native protocol support.

Anyway, we can revisit this in June or so by completely removing the legacy driver and 20.3 test from trino.

zhicwu avatar Feb 06 '22 12:02 zhicwu