clickhouse-java When spawning more than 16 requests with Java 11 HTTP client: com.clickhouse.client.ClickHouseException: Code: 159. Execution timed out

Describe the bug

When spawning more than 16 requests in parallel to a ClickHouse server using the Java HTTP client, all requests will hang and eventually time out (unless I also set MAX_THREADS_PER_CLIENT=1 or greater):

#error {
 :cause nil
 :via
 [{:type java.util.concurrent.ExecutionException
   :message com.clickhouse.client.ClickHouseException: Code: 159. Execution timed out
   :at [java.util.concurrent.FutureTask report FutureTask.java 122]}
  {:type com.clickhouse.client.ClickHouseException
   :message Code: 159. Execution timed out
   :at [com.clickhouse.client.ClickHouseException of ClickHouseException.java 147]}
  {:type java.util.concurrent.TimeoutException
   :message nil
   :at [java.util.concurrent.CompletableFuture timedGet CompletableFuture.java 1960]}]
 :trace
 [[java.util.concurrent.CompletableFuture timedGet CompletableFuture.java 1960]
  [java.util.concurrent.CompletableFuture get CompletableFuture.java 2095]
  [com.clickhouse.client.ClickHouseClient executeAndWait ClickHouseClient.java 878]
  [ch_timeout_mwe$execute_BANG_ invokeStatic ch_timeout_mwe.clj 34]
  [ch_timeout_mwe$execute_BANG_ invoke ch_timeout_mwe.clj 31]
  [ch_timeout_mwe$ping invokeStatic ch_timeout_mwe.clj 40]
  [ch_timeout_mwe$ping invoke ch_timeout_mwe.clj 36]
  [ch_timeout_mwe$_main$iter__158__162$fn__163$fn__164$fn__165 invoke ch_timeout_mwe.clj 56]
  [clojure.core$binding_conveyor_fn$fn__5823 invoke core.clj 2047]
  [clojure.lang.AFn call AFn.java 18]
  [java.util.concurrent.FutureTask run FutureTask.java 317]
  [java.util.concurrent.ThreadPoolExecutor runWorker ThreadPoolExecutor.java 1144]
  [java.util.concurrent.ThreadPoolExecutor$Worker run ThreadPoolExecutor.java 642]
  [java.lang.Thread run Thread.java 1583]]}

Afterwards the application is broken; the only way to make requests to ClickHouse possible again is by restarting the JVM.

Repeatedly sending 16 or fewer requests does not exhibit the hanging requests.

Steps to reproduce

Follow the instructions in https://github.com/devurandom/clickhouse-timeout-mwe to run the sample application.
Observe that everything works fine when spawning 16 or less requests, or when passing a value greater than 0 as MAX_THREADS_PER_CLIENT.
Observe that the application hangs and all requests to ClickHouse throw java.util.concurrent.TimeoutException when spawning 17 or more requests while passing MAX_THREADS_PER_CLIENT as 0.

Expected behaviour

ClickHouse Java HTTP client works, regardless of how many requests I tell it to make at the same time.

Code example

See https://github.com/devurandom/clickhouse-timeout-mwe.

Configuration

Environment

Client version: ClickHouse HTTP Client 0.6.2, using the built-in Java HTTP Client
Language version: Java 11, Clojure 1.11.3
OS: Linux 6.9

ClickHouse server

ClickHouse Server version: ClickHouse Cloud
ClickHouse Server non-default settings, if any: ClickHouse Cloud
CREATE TABLE statements for tables involved: Reproduction works without any tables

Jul 19 '24 23:07 devurandom

I believe I am facing the same issue, using Spring Boot Webflux (Reactor).

When creating the ClickHouseNode and ClickHouseClient as a bean:

@Configuration
public class DatabaseDriverConfig {
 @Bean
    public ClickHouseNode nodeFactory() {
        return ClickHouseNode.builder()
          .host(clickhouse.getHost())
          .port(ClickHouseProtocol.HTTP, clickhouse.getPort())
          .database(clickhouse.getDatabase())
          .credentials(ClickHouseCredentials.fromUserAndPassword(
            clickhouse.getUsername(),
            clickhouse.getPassword()
          ))
          .build();
    }

    @Bean
    public ClickHouseClient clientFactory(ClickHouseNode node) {
        return ClickHouseClient.builder()
          .option(ClickHouseClientOption.CLIENT_NAME, "query-engine-api")
          // the below response has async enabled - I have omitted it from this builder.
          .nodeSelector(ClickHouseNodeSelector.of(node.getProtocol()))
          .build();
    }
}

And then using it to make calls to Clickhouse:

public Mono<PageInfo> fetchPageInfo(UUID cid, Integer pageNo) {
        final String q = """
          SELECT COALESCE(COUNT(*), 0) as total_elements
          FROM seer.event
          """;
        return Mono.fromFuture(client.read(node)
            .format(ClickHouseFormat.RowBinaryWithNamesAndTypes)
            .query(q)
            .execute())
          .map(ClickHouseResponse::firstRecord)
          .map(x -> {
              int totalElements = x.getValue("total_elements").asInteger();
              return PageInfo.builder()
                .pageNo(pageNo)
                .totalPages(totalElements / pageNo)
                .totalElements(totalElements)
                .build();
          })
          .onErrorResume(NoSuchElementException.class, e -> Mono.just(PageInfo.builder()
            .pageNo(pageNo)
            .totalPages(0)
            .totalElements(0)
            .build()));
    }

The first 10 requests work, the moment I make the 11th request the connection times-out and is no longer usable. The only way to get the client to work again is to restart the API.

The error looks like the following:

2024-07-21T15:04:01.580+01:00 ERROR 71784 --- [query-engine-api] [kHouseWorker-21] a.w.r.e.AbstractErrorWebExceptionHandler : [47bd0242-11]  500 Server Error for HTTP GET "/event?start=2024-01-01T13:13:13&end=2024-05-01T13:13:13&pageNo=1&limit=500"

com.clickhouse.client.ClickHouseException: Timeout deadline: 180000 MILLISECONDS, actual: 180007 MILLISECONDS, server ClickHouseNode [uri=http://localhost:8123/seer]@-737830262
	at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:168) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
	Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
	*__checkpoint ⇢ Handler com.hexploits.seer.api.queryengine.controller.EventController#fetchEvents(UUID, UUID, String, ServerWebExchange) [DispatcherHandler]
	*__checkpoint ⇢ HTTP GET "/event?start=2024-01-01T13:13:13&end=2024-05-01T13:13:13&pageNo=1&limit=500" [ExceptionHandlingWebHandler]
Original Stack Trace:
		at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:168) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
		at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:275) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
		at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[na:na]
		at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
		at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
		at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 180000 MILLISECONDS, actual: 180007 MILLISECONDS
	at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.acquireEndpoint(InternalExecRuntime.java:120) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:125) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:96) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:115) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170) ~[httpclient5-5.2.3.jar:5.2.3]
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:123) ~[httpclient5-5.2.3.jar:5.2.3]
	at com.clickhouse.client.http.ApacheHttpConnectionImpl.post(ApacheHttpConnectionImpl.java:241) ~[clickhouse-http-client-0.6.0.jar:clickhouse-http-client 0.6.0 (revision: cb8e198)]
	at com.clickhouse.client.http.ClickHouseHttpClient.send(ClickHouseHttpClient.java:118) ~[clickhouse-http-client-0.6.0.jar:clickhouse-http-client 0.6.0 (revision: cb8e198)]
	at com.clickhouse.client.AbstractClient.sendAsync(AbstractClient.java:161) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
	at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:273) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
	at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]

Jul 21 '24 14:07 CameronMukherjee

When spawning more than 16 requests in parallel to a ClickHouse server using the Java HTTP client, all requests will hang and eventually time out (unless I also set MAX_THREADS_PER_CLIENT=1 or greater, the default being 0):

As a heads up - the following does not work for me:

 @Bean
    public ClickHouseClient clientFactory(ClickHouseNode node) {
        return ClickHouseClient.builder()
          .option(ClickHouseClientOption.CLIENT_NAME, "query-engine-api")
          .nodeSelector(ClickHouseNodeSelector.of(node.getProtocol()))
          .option(ClickHouseClientOption.ASYNC, true)
          .option(ClickHouseClientOption.MAX_THREADS_PER_CLIENT, 1)
          .build();
    }

I've also tried setting the max threads per client to 5, 10 and 15 - with none of them working as intended. The moment the 11th request comes through the request times-outs.

Jul 21 '24 14:07 CameronMukherjee

Removing the implementation 'org.apache.httpcomponents.client5:httpclient5:5.2.3' dependency fixes this issue as you fall back to HTTP_URL_CONNECTION which doesn't have this issue.

Hope this helps someone else!

Jul 21 '24 19:07 CameronMukherjee

Removing the implementation 'org.apache.httpcomponents.client5:httpclient5:5.2.3' dependency fixes this issue as you fall back to HTTP_URL_CONNECTION which doesn't have this issue.

I am using the Java 11 HTTP client, cf. https://github.com/devurandom/clickhouse-timeout-mwe/blob/88a48525a6ef68b21cb9dbea0fe0d5019f3ac36b/src/ch_timeout_mwe.clj#L20-L23.

Jul 21 '24 20:07 devurandom

Good day, @devurandom @CameronMukherjee ! The problem looks like when response object is not closed. It will keep connection open and would not let it return to a connection pool (that is in apache HC built-in). With HttpURLConnection it will create unlimited number of connection and may eventually fail by open file descriptors.

Jul 22 '24 17:07 chernser

Update: Setting ClickHouseClientOption/MAX_THREADS_PER_CLIENT 2 (instead of the default of 0), which makes the problem disappear in my minimal example, does not seem to workaround the issue in our larger application. It is still unclear to me why.

Removing the implementation 'org.apache.httpcomponents.client5:httpclient5:5.2.3' dependency fixes this issue as you fall back to HTTP_URL_CONNECTION which doesn't have this issue.

I can confirm that changing https://github.com/devurandom/clickhouse-timeout-mwe/blob/88a48525a6ef68b21cb9dbea0fe0d5019f3ac36b/src/ch_timeout_mwe.clj#L20-L23 from {ClickHouseHttpOption/CONNECTION_PROVIDER HttpConnectionProvider/HTTP_CLIENT} to {ClickHouseHttpOption/CONNECTION_PROVIDER HttpConnectionProvider/HTTP_URL_CONNECTION} also works around the issue in my minimal example. (I did not try it in our service for the reason given by @chernser.)

The problem looks like when response object is not closed.

So this is related to https://github.com/ClickHouse/clickhouse-java/issues/1712 ?

It will keep connection open and would not let it return to a connection pool (that is in apache HC built-in).

My minimal example reproduces this without the Apache HTTP client. It uses the Java 11 HTTP client instead.

With HttpURLConnection it will create unlimited number of connection and may eventually fail by open file descriptors.

So setting {ClickHouseHttpOption/CONNECTION_PROVIDER HttpConnectionProvider/HTTP_URL_CONNECTION} is not a workaround that would work in a long running application or service. Thanks for this clarification!

Jul 22 '24 18:07 devurandom

The machine I reproduced https://github.com/devurandom/clickhouse-timeout-mwe on has 8 as "processors" reported by (.availableProcessors (Runtime/getRuntime)), while our actual application runs on a machine / in a container with 2 "processors" (cf. https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmcpucount).

This seems relevant in the context of com.clickhouse.client.http.ClickHouseHttpClient#newConnection which uses the executor created by com.clickhouse.client.AbstractClient#init via com.clickhouse.client.ClickHouseClient#getExecutorService and com.clickhouse.data.ClickHouseDataStreamFactory.DefaultExecutors with (+ 1 (* 2 (.availableProcessors (Runtime/getRuntime)))) threads if MAX_THREADS_PER_CLIENT is unset (i.e. 0).

Setting -XX:ActiveProcessorCount=2 indeed changes the number of requests ClickHouse can handle without hanging:

(.availableProcessors (Runtime/getRuntime))
=> 2

(-main "${ENDPOINT}" "${USERNAME}" "${PASSWORD}" 2 0)

Spawning threads...
Wait for all threads:
Waiting for thread 0
Waiting for thread 1
Collected all threads!
=> nil

(-main "${ENDPOINT}" "${USERNAME}" "${PASSWORD}" 4 0)

Spawning threads...
Wait for all threads:
Waiting for thread 0
Waiting for thread 1
Waiting for thread 2
Waiting for thread 3
Collected all threads!
=> nil

(-main "${ENDPOINT}" "${USERNAME}" "${PASSWORD}" 5 0)

Spawning threads...
Wait for all threads:
Waiting for thread 0
ERROR: #error {
 :cause nil
 :via
 [{:type java.util.concurrent.ExecutionException
   :message com.clickhouse.client.ClickHouseException: Code: 159. Execution timed out
   :at [java.util.concurrent.FutureTask report FutureTask.java 122]}
  {:type com.clickhouse.client.ClickHouseException
   :message Code: 159. Execution timed out
   :at [com.clickhouse.client.ClickHouseException of ClickHouseException.java 147]}
  {:type java.util.concurrent.TimeoutException
   :message nil
   :at [java.util.concurrent.CompletableFuture timedGet CompletableFuture.java 1960]}]
 :trace
 [[java.util.concurrent.CompletableFuture timedGet CompletableFuture.java 1960]
  [java.util.concurrent.CompletableFuture get CompletableFuture.java 2095]
  [com.clickhouse.client.ClickHouseClient executeAndWait ClickHouseClient.java 878]
  [ch_timeout_mwe$execute_BANG_ invokeStatic ch_timeout_mwe.clj 34]
  [ch_timeout_mwe$execute_BANG_ invoke ch_timeout_mwe.clj 31]
  [ch_timeout_mwe$ping invokeStatic ch_timeout_mwe.clj 40]
  [ch_timeout_mwe$ping invoke ch_timeout_mwe.clj 36]
  [ch_timeout_mwe$_main$iter__1943__1947$fn__1948$fn__1949$fn__1950 invoke ch_timeout_mwe.clj 56]
  [clojure.core$binding_conveyor_fn$fn__5823 invoke core.clj 2047]
  [clojure.lang.AFn call AFn.java 18]
  [java.util.concurrent.FutureTask run FutureTask.java 264]
  [java.util.concurrent.ThreadPoolExecutor runWorker ThreadPoolExecutor.java 1136]
  [java.util.concurrent.ThreadPoolExecutor$Worker run ThreadPoolExecutor.java 635]
  [java.lang.Thread run Thread.java 840]]}
...

The -main function is the one in https://github.com/devurandom/clickhouse-timeout-mwe.

Jul 22 '24 21:07 devurandom

We are now using java.util.concurrent.Semaphore to limit the number of request we submit to the ClickHouse client to 2 times the number of processors ((* 2 (.availableProcessors (Runtime/getRuntime)))) and our service finally sends out requests, receives responses and runs stably.

Jul 23 '24 10:07 devurandom

@devurandom Thank you for the information!

Do you close Response objects?
Am I right that deadlock happens if task is submitted to the executor service when no idle workers?
Would you please describe your high-level logic? where do you lock? Thanks!

Jul 23 '24 17:07 chernser

I used the word "deadlock" in my previous comment which might have caused confusion. I should have written "hang" (or "block" or similar). I corrected that in my original comment above.

Do you close Response objects?

In the minimal example, if a response would be received instead of a timeout exception, the response would be closed, cf. https://github.com/devurandom/clickhouse-timeout-mwe/blob/main/src/ch_timeout_mwe.clj#L31-L34.

The problem is that no response is ever received by the code of the minimal example, because there seems to be a problem within the ClickHouse client that prevents a request from ever being processed. The ClickHouse client throws a timeout exception before it ever returns a response.

Am I right that deadlock happens if task is submitted to the executor service when no idle workers?

I cannot answer this question, as I do not know enough about the inner workings of the ClickHouse client. I just stepped through the ClickHouse client code with a debugger and took notes.

Would you please describe your high-level logic? where do you lock?

In the minimal example, no locks are used. It first spawns a bunch of futures on Clojure's own thread pool (not shared with e.g. the Java 11 HTTP client or the ClickHouse client) and then waits for them to complete, cf. https://github.com/devurandom/clickhouse-timeout-mwe/blob/main/src/ch_timeout_mwe.clj#L55-L63.

Jul 24 '24 12:07 devurandom

I change client to ClickHouse Java Client V2 recently, and meet similar problem.

After some normal query, all query timeout.

org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 10000 MILLISECONDS, actual: 10001 MILLISECONDS at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.acquireEndpoint(InternalExecRuntime.java:120) at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:125) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:152) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:116) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170) at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87) at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183) at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:350) at com.clickhouse.client.api.Client.lambda$query$11(Client.java:1501) at com.clickhouse.client.api.Client.runAsyncOperation(Client.java:1864) at com.clickhouse.client.api.Client.query(Client.java:1572) at com.clickhouse.client.api.Client.query(Client.java:1427)

I use JDBC client before, and It works fine for me. So I decide rollback to JDBC client to avoid this.

Appreciate for any help of this problem. Thanks!

Dec 08 '24 05:12 zalj

I change client to ClickHouse Java Client V2 recently, and meet similar problem.

After some normal query, all query timeout.

org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 10000 MILLISECONDS, actual: 10001 MILLISECONDS at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.acquireEndpoint(InternalExecRuntime.java:120) at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:125) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:152) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:116) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170) at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87) at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183) at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:350) at com.clickhouse.client.api.Client.lambda$query$11(Client.java:1501) at com.clickhouse.client.api.Client.runAsyncOperation(Client.java:1864) at com.clickhouse.client.api.Client.query(Client.java:1572) at com.clickhouse.client.api.Client.query(Client.java:1427)

I use JDBC client before, and It works fine for me. So I decide rollback to JDBC client to avoid this.

Appreciate for any help of this problem. Thanks!

I try to fix this by add some client option. Then I found if set ClickHouseClientOption.ASYNC with value true, the problem disappear. Perhaps this is a clue. @chernser

code like:

    private Client buildClient(String endPoint, String user, String password) {
        return new Client.Builder()
                .addEndpoint(endPoint)
                .setUsername(user)
                .setPassword(password)
                .setOption(ClickHouseClientOption.ASYNC.getKey(), "true")
                .setMaxConnections(10)
                .setLZ4UncompressedBufferSize(1058576)
                .setSocketRcvbuf(500_000)
                .setSocketTcpNodelay(true)
                .setSocketSndbuf(500_000)
                .setSocketTimeout(TimeUnit.MINUTES.toMillis(5))
                .setClientNetworkBufferSize(500_000)
                .allowBinaryReaderToReuseBuffers(true) // using buffer pool for binary reader
                .build();
    }

Dec 09 '24 11:12 zalj

Good day, @zalj ! Thank you for providing the information! I think I know where is the problem. As for ASYNC operations - I think, client should be used in blocking way, because application knows better how to spawn and manage threads. Client in this case will just spawn extra threads but it will not give any performance benefits because IO is blocking internally.

Dec 12 '24 05:12 chernser

I encountered a similar problem using clickhouse-r2dbc:0.8.6 HTTP request failed org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 180000 MILLISECONDS, actual: 180000 MILLISECONDS

I noticed that connections started leaking

there was approximately the following logic databaseClient.sql("select count(*) from my_table") ... .next() I replaced .next() with .collectList().mapNotNull(CollectionUtils::firstElement) and everything worked for me, connections do not leak. Maybe it will help someone

Jul 04 '25 04:07 ost-av

Hi @chernser, we're on com.clickhouse:client-v2:0.9.2 release which is supposed to contain fixes for this issue, however we're facing the same problem after 12 requests + 4 retries (16 total requests):

25-09-18 07:38:02.040 [DefaultDispatcher-worker-1] WARN  c.c.client.api.Client - [] Retrying.
org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 10000 MILLISECONDS, actual: 10006 MILLISECONDS
	at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.acquireEndpoint(InternalExecRuntime.java:122)
	at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:127)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:150)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:110)
	at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
	at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:183)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87)
	at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55)
	at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183)
	at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:450)
	at com.clickhouse.client.api.Client.lambda$query$5(Client.java:1593)
	at com.clickhouse.client.api.Client.runAsyncOperation(Client.java:2007)
	at com.clickhouse.client.api.Client.query(Client.java:1637)
	at com.clickhouse.client.api.Client.query(Client.java:1536)
…
25-09-18 07:38:02.041 [DefaultDispatcher-worker-1] WARN  c.c.client.api.Client - [] Query request failed after attempts: 4 - Duration: 40015538309

I'm afraid the issue needs to be reopened.

Sep 18 '25 07:09 artem-zinnatullin

Update: I was NOT calling close on client.query() and client.insert() response objects, in my case it was happening during schema initialization as soon as we got 16+ queries — everything started to freeze:

logger.info("init: $className...")

withTimeout(initSchemaTimeout) {
    val initQuerySettings = QuerySettings().also {
        it.setNetworkTimeout(initSchemaTimeout.inWholeMilliseconds, MILLIS)
        it.setMaxExecutionTime(initSchemaTimeout.inWholeSeconds.toInt())
    }

    tryWithExponentialBackoff {
        // ClickHouse requires each SQL statement (separated by ';') to be queried separately.
        initSqlStatements.forEach { initSqlStatement ->
            val duration = measureTime {
                logger.info("init: executing statement {}", initSqlStatement)

                client
                    .query(initSqlStatement, initQuerySettings)
                    .await()
                    .close() // CRUCIAL!!!
            }

            logger.info("init: statement took {}", duration)
        }
    }
}

The thing is — it's not at all obvious that you have to close() returned objects if you don't even use them. This needs to be pointed out in README, even repo-own code examples don't do it:

https://github.com/ClickHouse/clickhouse-java/blob/87db1475b0211818662ace754c60ed705aee1314/examples/client-v2/src/main/java/com/clickhouse/examples/client_v2/POJO2DbWriter.java#L93-L95

I'm not sure how anyone should magically know this 😕

Sep 18 '25 18:09 artem-zinnatullin

Good day, @artem-zinnatullin I understand your pain. But we may not close response before returning it because it will close socket stream and connection will be release. Other API methods like queryAll() will close response because they read out result. In documentation we are telling about closing QueryResponse: https://clickhouse.com/docs/integrations/language-clients/java/client#query-api

Besides QueryResponse, InsertResponse and CommandResponse implement AutoCloseable. I will correct the example.

Sep 18 '25 19:09 chernser

@chernser thanks for looking into this

I'd suggest that all methods returning AutoCloseables to be annotated with JSR-305 @CheckReturnValue and have javadoc that clearly explains what will happen if it's not closed (ie network pool can get stuck)

https://errorprone.info/bugpattern/CheckReturnValue

On my end I now found the notes in the website doc:

This object should be closed as soon as possible to release a connection because the connection cannot be re-used until all data of previous response is fully read.

Not sure why repo and website have different examples/docs: https://github.com/ClickHouse/clickhouse-java?tab=readme-ov-file#examples vs https://clickhouse.com/docs/integrations/language-clients/java/client

Javadoc on the other hand doesn't say anything about closing the return objects from query methods:

Sep 19 '25 19:09 artem-zinnatullin