clickhouse-java icon indicating copy to clipboard operation
clickhouse-java copied to clipboard

Failover not working for multiple ck nodes using random load balance

Open happyjohnson123 opened this issue 2 years ago • 1 comments

Describe the bug

I want to use jdbc to connect the multiple clickhouse nodes, set the load balance with 'random', and also set the failover with '3'. But when I configure my two endpoints (one endpoint is valid, and another endpoint is invalid) in jdbc url, the query will pop up the connection timeout exception for that invalid endpoint, failover does not work.

Steps to reproduce

  1. Using docker to setup a clickhouse instance with 'yandex/clickhouse-server:latest' image version
  2. Configure the jdbc url with 'url:jdbc:clickhouse:http://192.168.2.122:8123,192.168.0.103:8123/default&load_balancing_policy=random&failover=3' (192.168.2.122 is invalid endpoint, 192.168.0.103 is valid endpoint which is setup in step 1)
  3. Using jdbc code to query result

Expected behaviour

failover mechanism should switch to the valid endpoint to query the result

Code example

String url = "jdbc:clickhouse:http://192.168.2.122:8123,192.168.0.103:8123/default?load_balancing_policy=random&failover=3";
Properties p = new Properties();
p.setProperty("user", "default");
p.setProperty("password", "");
ClickHouseDataSource dataSource = new ClickHouseDataSource(url, p);
try (Connection connection = dataSource.getConnection()) {
        Statement stmt = connection.createStatement();
        ResultSet resultSet = stmt.executeQuery("show databases");
}

Error log

java.sql.SQLException: Connect to http://192.168.2.122:8123 [/192.168.2.122] failed: connect timed out, server ClickHouseNode [uri=http://192.168.2.122:8123/default]@645498125 at com.clickhouse.jdbc.SqlExceptionUtils.handle(SqlExceptionUtils.java:85) at com.clickhouse.jdbc.SqlExceptionUtils.create(SqlExceptionUtils.java:31) at com.clickhouse.jdbc.SqlExceptionUtils.handle(SqlExceptionUtils.java:90) at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.getServerInfo(ClickHouseConnectionImpl.java:131) at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.(ClickHouseConnectionImpl.java:335) at com.clickhouse.jdbc.ClickHouseDataSource.getConnection(ClickHouseDataSource.java:46) at org.trimps.mps.webclue.commondata.TestFailover.contextLoads(TestFailover.java:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.springframework.test.context.junit4.statements.RunBeforeTestExecutionCallbacks.evaluate(RunBeforeTestExecutionCallbacks.java:74) at org.springframework.test.context.junit4.statements.RunAfterTestExecutionCallbacks.evaluate(RunAfterTestExecutionCallbacks.java:84) at org.springframework.test.context.junit4.statements.RunBeforeTestMethodCallbacks.evaluate(RunBeforeTestMethodCallbacks.java:75) at org.springframework.test.context.junit4.statements.RunAfterTestMethodCallbacks.evaluate(RunAfterTestMethodCallbacks.java:86) at org.springframework.test.context.junit4.statements.SpringRepeat.evaluate(SpringRepeat.java:84) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:251) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61) at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:235) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:54) Caused by: com.clickhouse.client.internal.apache.hc.client5.http.ConnectTimeoutException: Connect to http://192.168.2.122:8123 [/192.168.2.122] failed: connect timed out at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method) at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at com.clickhouse.client.internal.apache.hc.client5.http.socket.PlainConnectionSocketFactory.lambda$connectSocket$0(PlainConnectionSocketFactory.java:85) at java.security.AccessController.doPrivileged(Native Method) at com.clickhouse.client.internal.apache.hc.client5.http.socket.PlainConnectionSocketFactory.connectSocket(PlainConnectionSocketFactory.java:84) at com.clickhouse.client.internal.apache.hc.client5.http.socket.ConnectionSocketFactory.connectSocket(ConnectionSocketFactory.java:113) at com.clickhouse.client.internal.apache.hc.client5.http.impl.io.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:181) at com.clickhouse.client.internal.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:447) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.InternalExecRuntime.connectEndpoint(InternalExecRuntime.java:162) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.InternalExecRuntime.connectEndpoint(InternalExecRuntime.java:172) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:142) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:96) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:115) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170) at com.clickhouse.client.internal.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:123) at com.clickhouse.client.http.ApacheHttpConnectionImpl.post(ApacheHttpConnectionImpl.java:241) at com.clickhouse.client.http.ClickHouseHttpClient.send(ClickHouseHttpClient.java:118) at com.clickhouse.client.AbstractClient.execute(AbstractClient.java:280) at com.clickhouse.client.ClickHouseClientBuilder$Agent.sendOnce(ClickHouseClientBuilder.java:282) at com.clickhouse.client.ClickHouseClientBuilder$Agent.send(ClickHouseClientBuilder.java:294) at com.clickhouse.client.ClickHouseClientBuilder$Agent.execute(ClickHouseClientBuilder.java:349) at com.clickhouse.client.ClickHouseClient.executeAndWait(ClickHouseClient.java:877) at com.clickhouse.client.ClickHouseRequest.executeAndWait(ClickHouseRequest.java:2154) at com.clickhouse.jdbc.internal.ClickHouseConnectionImpl.getServerInfo(ClickHouseConnectionImpl.java:128) ... 35 more

My Guess

I think the real cause is that in the below codes

protected ClickHouseNode suggest(ClickHouseNodes manager, ClickHouseNode server, Throwable failure) {
        if (manager == null || server == null || !(failure instanceof ClickHouseException)) {
            return server;
        }

        ClickHouseException exp = (ClickHouseException) failure;
        // only connection errors at this point
        if (exp.getErrorCode() == ClickHouseException.ERROR_NETWORK
                || ClickHouseException.isConnectTimedOut(exp.getCause())) {
            ClickHouseNodeSelector selector = manager.getNodeSelector();
            for (ClickHouseNode node : manager.nodes) {
                if (selector.match(node) && !node.isSameEndpoint(server)) {
                    return node;
                }
            }
        }
        return server;
    }

When connection time out is occurred, it will go into the 'if' block to check the time out exception, but in the following codes, I found it can not detect the exception message for the time out.

public static boolean isConnectTimedOut(Throwable t) {
        if (t instanceof SocketTimeoutException || t instanceof TimeoutException) {
            String msg = t.getMessage();
            if (msg != null && msg.length() >= MSG_CONNECT_TIMED_OUT.length()) {
                msg = msg.substring(0, MSG_CONNECT_TIMED_OUT.length()).toLowerCase(Locale.ROOT);
            }
            return MSG_CONNECT_TIMED_OUT.equals(msg);
        }

        return false;
    }

From the error log, it shows that the exception message is 'Connect to http://192.168.2.122:8123 [/192.168.2.122] failed: connect timed out', but after running the method 'isConnectTimedOut', it returned false unfortunately.

Configuration

Environment

  • client version: 0.5.0

ClickHouse server

  • ClickHouse Server version: 22.1.3.7

happyjohnson123 avatar Oct 12 '23 04:10 happyjohnson123

PR with fix #1609

mbaksheev avatar Apr 11 '24 11:04 mbaksheev