When spawning more than 16 requests with Java 11 HTTP client: com.clickhouse.client.ClickHouseException: Code: 159. Execution timed out
Describe the bug
When spawning more than 16 requests in parallel to a ClickHouse server using the Java HTTP client, all requests will hang and eventually time out (unless I also set MAX_THREADS_PER_CLIENT=1 or greater):
#error {
:cause nil
:via
[{:type java.util.concurrent.ExecutionException
:message com.clickhouse.client.ClickHouseException: Code: 159. Execution timed out
:at [java.util.concurrent.FutureTask report FutureTask.java 122]}
{:type com.clickhouse.client.ClickHouseException
:message Code: 159. Execution timed out
:at [com.clickhouse.client.ClickHouseException of ClickHouseException.java 147]}
{:type java.util.concurrent.TimeoutException
:message nil
:at [java.util.concurrent.CompletableFuture timedGet CompletableFuture.java 1960]}]
:trace
[[java.util.concurrent.CompletableFuture timedGet CompletableFuture.java 1960]
[java.util.concurrent.CompletableFuture get CompletableFuture.java 2095]
[com.clickhouse.client.ClickHouseClient executeAndWait ClickHouseClient.java 878]
[ch_timeout_mwe$execute_BANG_ invokeStatic ch_timeout_mwe.clj 34]
[ch_timeout_mwe$execute_BANG_ invoke ch_timeout_mwe.clj 31]
[ch_timeout_mwe$ping invokeStatic ch_timeout_mwe.clj 40]
[ch_timeout_mwe$ping invoke ch_timeout_mwe.clj 36]
[ch_timeout_mwe$_main$iter__158__162$fn__163$fn__164$fn__165 invoke ch_timeout_mwe.clj 56]
[clojure.core$binding_conveyor_fn$fn__5823 invoke core.clj 2047]
[clojure.lang.AFn call AFn.java 18]
[java.util.concurrent.FutureTask run FutureTask.java 317]
[java.util.concurrent.ThreadPoolExecutor runWorker ThreadPoolExecutor.java 1144]
[java.util.concurrent.ThreadPoolExecutor$Worker run ThreadPoolExecutor.java 642]
[java.lang.Thread run Thread.java 1583]]}
Afterwards the application is broken; the only way to make requests to ClickHouse possible again is by restarting the JVM.
Repeatedly sending 16 or fewer requests does not exhibit the hanging requests.
Steps to reproduce
- Follow the instructions in https://github.com/devurandom/clickhouse-timeout-mwe to run the sample application.
- Observe that everything works fine when spawning 16 or less requests, or when passing a value greater than 0 as
MAX_THREADS_PER_CLIENT. - Observe that the application hangs and all requests to ClickHouse throw
java.util.concurrent.TimeoutExceptionwhen spawning 17 or more requests while passingMAX_THREADS_PER_CLIENTas 0.
Expected behaviour
ClickHouse Java HTTP client works, regardless of how many requests I tell it to make at the same time.
Code example
See https://github.com/devurandom/clickhouse-timeout-mwe.
Configuration
Environment
- Client version: ClickHouse HTTP Client 0.6.2, using the built-in Java HTTP Client
- Language version: Java 11, Clojure 1.11.3
- OS: Linux 6.9
ClickHouse server
- ClickHouse Server version: ClickHouse Cloud
- ClickHouse Server non-default settings, if any: ClickHouse Cloud
-
CREATE TABLEstatements for tables involved: Reproduction works without any tables
I believe I am facing the same issue, using Spring Boot Webflux (Reactor).
When creating the ClickHouseNode and ClickHouseClient as a bean:
@Configuration
public class DatabaseDriverConfig {
@Bean
public ClickHouseNode nodeFactory() {
return ClickHouseNode.builder()
.host(clickhouse.getHost())
.port(ClickHouseProtocol.HTTP, clickhouse.getPort())
.database(clickhouse.getDatabase())
.credentials(ClickHouseCredentials.fromUserAndPassword(
clickhouse.getUsername(),
clickhouse.getPassword()
))
.build();
}
@Bean
public ClickHouseClient clientFactory(ClickHouseNode node) {
return ClickHouseClient.builder()
.option(ClickHouseClientOption.CLIENT_NAME, "query-engine-api")
// the below response has async enabled - I have omitted it from this builder.
.nodeSelector(ClickHouseNodeSelector.of(node.getProtocol()))
.build();
}
}
And then using it to make calls to Clickhouse:
public Mono<PageInfo> fetchPageInfo(UUID cid, Integer pageNo) {
final String q = """
SELECT COALESCE(COUNT(*), 0) as total_elements
FROM seer.event
""";
return Mono.fromFuture(client.read(node)
.format(ClickHouseFormat.RowBinaryWithNamesAndTypes)
.query(q)
.execute())
.map(ClickHouseResponse::firstRecord)
.map(x -> {
int totalElements = x.getValue("total_elements").asInteger();
return PageInfo.builder()
.pageNo(pageNo)
.totalPages(totalElements / pageNo)
.totalElements(totalElements)
.build();
})
.onErrorResume(NoSuchElementException.class, e -> Mono.just(PageInfo.builder()
.pageNo(pageNo)
.totalPages(0)
.totalElements(0)
.build()));
}
The first 10 requests work, the moment I make the 11th request the connection times-out and is no longer usable. The only way to get the client to work again is to restart the API.
The error looks like the following:
2024-07-21T15:04:01.580+01:00 ERROR 71784 --- [query-engine-api] [kHouseWorker-21] a.w.r.e.AbstractErrorWebExceptionHandler : [47bd0242-11] 500 Server Error for HTTP GET "/event?start=2024-01-01T13:13:13&end=2024-05-01T13:13:13&pageNo=1&limit=500"
com.clickhouse.client.ClickHouseException: Timeout deadline: 180000 MILLISECONDS, actual: 180007 MILLISECONDS, server ClickHouseNode [uri=http://localhost:8123/seer]@-737830262
at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:168) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ Handler com.hexploits.seer.api.queryengine.controller.EventController#fetchEvents(UUID, UUID, String, ServerWebExchange) [DispatcherHandler]
*__checkpoint ⇢ HTTP GET "/event?start=2024-01-01T13:13:13&end=2024-05-01T13:13:13&pageNo=1&limit=500" [ExceptionHandlingWebHandler]
Original Stack Trace:
at com.clickhouse.client.ClickHouseException.of(ClickHouseException.java:168) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:275) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 180000 MILLISECONDS, actual: 180007 MILLISECONDS
at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.acquireEndpoint(InternalExecRuntime.java:120) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:125) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:96) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:115) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170) ~[httpclient5-5.2.3.jar:5.2.3]
at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:123) ~[httpclient5-5.2.3.jar:5.2.3]
at com.clickhouse.client.http.ApacheHttpConnectionImpl.post(ApacheHttpConnectionImpl.java:241) ~[clickhouse-http-client-0.6.0.jar:clickhouse-http-client 0.6.0 (revision: cb8e198)]
at com.clickhouse.client.http.ClickHouseHttpClient.send(ClickHouseHttpClient.java:118) ~[clickhouse-http-client-0.6.0.jar:clickhouse-http-client 0.6.0 (revision: cb8e198)]
at com.clickhouse.client.AbstractClient.sendAsync(AbstractClient.java:161) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
at com.clickhouse.client.AbstractClient.lambda$execute$0(AbstractClient.java:273) ~[clickhouse-client-0.6.0.jar:clickhouse-client 0.6.0 (revision: cb8e198)]
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
When spawning more than 16 requests in parallel to a ClickHouse server using the Java HTTP client, all requests will hang and eventually time out (unless I also set
MAX_THREADS_PER_CLIENT=1or greater, the default being 0):
As a heads up - the following does not work for me:
@Bean
public ClickHouseClient clientFactory(ClickHouseNode node) {
return ClickHouseClient.builder()
.option(ClickHouseClientOption.CLIENT_NAME, "query-engine-api")
.nodeSelector(ClickHouseNodeSelector.of(node.getProtocol()))
.option(ClickHouseClientOption.ASYNC, true)
.option(ClickHouseClientOption.MAX_THREADS_PER_CLIENT, 1)
.build();
}
I've also tried setting the max threads per client to 5, 10 and 15 - with none of them working as intended. The moment the 11th request comes through the request times-outs.
Removing the implementation 'org.apache.httpcomponents.client5:httpclient5:5.2.3' dependency fixes this issue as you fall back to HTTP_URL_CONNECTION which doesn't have this issue.
Hope this helps someone else!
Removing the
implementation 'org.apache.httpcomponents.client5:httpclient5:5.2.3'dependency fixes this issue as you fall back toHTTP_URL_CONNECTIONwhich doesn't have this issue.
I am using the Java 11 HTTP client, cf. https://github.com/devurandom/clickhouse-timeout-mwe/blob/88a48525a6ef68b21cb9dbea0fe0d5019f3ac36b/src/ch_timeout_mwe.clj#L20-L23.
Good day, @devurandom @CameronMukherjee ! The problem looks like when response object is not closed. It will keep connection open and would not let it return to a connection pool (that is in apache HC built-in). With HttpURLConnection it will create unlimited number of connection and may eventually fail by open file descriptors.
Update: Setting ClickHouseClientOption/MAX_THREADS_PER_CLIENT 2 (instead of the default of 0), which makes the problem disappear in my minimal example, does not seem to workaround the issue in our larger application. It is still unclear to me why.
Removing the
implementation 'org.apache.httpcomponents.client5:httpclient5:5.2.3'dependency fixes this issue as you fall back toHTTP_URL_CONNECTIONwhich doesn't have this issue.
I can confirm that changing https://github.com/devurandom/clickhouse-timeout-mwe/blob/88a48525a6ef68b21cb9dbea0fe0d5019f3ac36b/src/ch_timeout_mwe.clj#L20-L23 from {ClickHouseHttpOption/CONNECTION_PROVIDER HttpConnectionProvider/HTTP_CLIENT} to {ClickHouseHttpOption/CONNECTION_PROVIDER HttpConnectionProvider/HTTP_URL_CONNECTION} also works around the issue in my minimal example. (I did not try it in our service for the reason given by @chernser.)
The problem looks like when response object is not closed.
So this is related to https://github.com/ClickHouse/clickhouse-java/issues/1712 ?
It will keep connection open and would not let it return to a connection pool (that is in apache HC built-in).
My minimal example reproduces this without the Apache HTTP client. It uses the Java 11 HTTP client instead.
With HttpURLConnection it will create unlimited number of connection and may eventually fail by open file descriptors.
So setting {ClickHouseHttpOption/CONNECTION_PROVIDER HttpConnectionProvider/HTTP_URL_CONNECTION} is not a workaround that would work in a long running application or service. Thanks for this clarification!
The machine I reproduced https://github.com/devurandom/clickhouse-timeout-mwe on has 8 as "processors" reported by (.availableProcessors (Runtime/getRuntime)), while our actual application runs on a machine / in a container with 2 "processors" (cf. https://opentelemetry.io/docs/specs/semconv/runtime/jvm-metrics/#metric-jvmcpucount).
This seems relevant in the context of com.clickhouse.client.http.ClickHouseHttpClient#newConnection which uses the executor created by com.clickhouse.client.AbstractClient#init via com.clickhouse.client.ClickHouseClient#getExecutorService and com.clickhouse.data.ClickHouseDataStreamFactory.DefaultExecutors with (+ 1 (* 2 (.availableProcessors (Runtime/getRuntime)))) threads if MAX_THREADS_PER_CLIENT is unset (i.e. 0).
Setting -XX:ActiveProcessorCount=2 indeed changes the number of requests ClickHouse can handle without hanging:
(.availableProcessors (Runtime/getRuntime))
=> 2
(-main "${ENDPOINT}" "${USERNAME}" "${PASSWORD}" 2 0)
Spawning threads...
Wait for all threads:
Waiting for thread 0
Waiting for thread 1
Collected all threads!
=> nil
(-main "${ENDPOINT}" "${USERNAME}" "${PASSWORD}" 4 0)
Spawning threads...
Wait for all threads:
Waiting for thread 0
Waiting for thread 1
Waiting for thread 2
Waiting for thread 3
Collected all threads!
=> nil
(-main "${ENDPOINT}" "${USERNAME}" "${PASSWORD}" 5 0)
Spawning threads...
Wait for all threads:
Waiting for thread 0
ERROR: #error {
:cause nil
:via
[{:type java.util.concurrent.ExecutionException
:message com.clickhouse.client.ClickHouseException: Code: 159. Execution timed out
:at [java.util.concurrent.FutureTask report FutureTask.java 122]}
{:type com.clickhouse.client.ClickHouseException
:message Code: 159. Execution timed out
:at [com.clickhouse.client.ClickHouseException of ClickHouseException.java 147]}
{:type java.util.concurrent.TimeoutException
:message nil
:at [java.util.concurrent.CompletableFuture timedGet CompletableFuture.java 1960]}]
:trace
[[java.util.concurrent.CompletableFuture timedGet CompletableFuture.java 1960]
[java.util.concurrent.CompletableFuture get CompletableFuture.java 2095]
[com.clickhouse.client.ClickHouseClient executeAndWait ClickHouseClient.java 878]
[ch_timeout_mwe$execute_BANG_ invokeStatic ch_timeout_mwe.clj 34]
[ch_timeout_mwe$execute_BANG_ invoke ch_timeout_mwe.clj 31]
[ch_timeout_mwe$ping invokeStatic ch_timeout_mwe.clj 40]
[ch_timeout_mwe$ping invoke ch_timeout_mwe.clj 36]
[ch_timeout_mwe$_main$iter__1943__1947$fn__1948$fn__1949$fn__1950 invoke ch_timeout_mwe.clj 56]
[clojure.core$binding_conveyor_fn$fn__5823 invoke core.clj 2047]
[clojure.lang.AFn call AFn.java 18]
[java.util.concurrent.FutureTask run FutureTask.java 264]
[java.util.concurrent.ThreadPoolExecutor runWorker ThreadPoolExecutor.java 1136]
[java.util.concurrent.ThreadPoolExecutor$Worker run ThreadPoolExecutor.java 635]
[java.lang.Thread run Thread.java 840]]}
...
The -main function is the one in https://github.com/devurandom/clickhouse-timeout-mwe.
We are now using java.util.concurrent.Semaphore to limit the number of request we submit to the ClickHouse client to 2 times the number of processors ((* 2 (.availableProcessors (Runtime/getRuntime)))) and our service finally sends out requests, receives responses and runs stably.
@devurandom Thank you for the information!
- Do you close Response objects?
- Am I right that deadlock happens if task is submitted to the executor service when no idle workers?
- Would you please describe your high-level logic? where do you lock? Thanks!
I used the word "deadlock" in my previous comment which might have caused confusion. I should have written "hang" (or "block" or similar). I corrected that in my original comment above.
Do you close Response objects?
In the minimal example, if a response would be received instead of a timeout exception, the response would be closed, cf. https://github.com/devurandom/clickhouse-timeout-mwe/blob/main/src/ch_timeout_mwe.clj#L31-L34.
The problem is that no response is ever received by the code of the minimal example, because there seems to be a problem within the ClickHouse client that prevents a request from ever being processed. The ClickHouse client throws a timeout exception before it ever returns a response.
Am I right that deadlock happens if task is submitted to the executor service when no idle workers?
I cannot answer this question, as I do not know enough about the inner workings of the ClickHouse client. I just stepped through the ClickHouse client code with a debugger and took notes.
Would you please describe your high-level logic? where do you lock?
In the minimal example, no locks are used. It first spawns a bunch of futures on Clojure's own thread pool (not shared with e.g. the Java 11 HTTP client or the ClickHouse client) and then waits for them to complete, cf. https://github.com/devurandom/clickhouse-timeout-mwe/blob/main/src/ch_timeout_mwe.clj#L55-L63.
I change client to ClickHouse Java Client V2 recently, and meet similar problem.
After some normal query, all query timeout.
org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 10000 MILLISECONDS, actual: 10001 MILLISECONDS at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.acquireEndpoint(InternalExecRuntime.java:120) at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:125) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:152) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:116) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170) at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87) at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183) at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:350) at com.clickhouse.client.api.Client.lambda$query$11(Client.java:1501) at com.clickhouse.client.api.Client.runAsyncOperation(Client.java:1864) at com.clickhouse.client.api.Client.query(Client.java:1572) at com.clickhouse.client.api.Client.query(Client.java:1427)
I use JDBC client before, and It works fine for me. So I decide rollback to JDBC client to avoid this.
Appreciate for any help of this problem. Thanks!
I change client to ClickHouse Java Client V2 recently, and meet similar problem.
After some normal query, all query timeout.
org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 10000 MILLISECONDS, actual: 10001 MILLISECONDS at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.acquireEndpoint(InternalExecRuntime.java:120) at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:125) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:152) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:116) at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51) at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170) at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87) at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55) at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183) at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:350) at com.clickhouse.client.api.Client.lambda$query$11(Client.java:1501) at com.clickhouse.client.api.Client.runAsyncOperation(Client.java:1864) at com.clickhouse.client.api.Client.query(Client.java:1572) at com.clickhouse.client.api.Client.query(Client.java:1427)I use JDBC client before, and It works fine for me. So I decide rollback to JDBC client to avoid this.
Appreciate for any help of this problem. Thanks!
I try to fix this by add some client option. Then I found if set ClickHouseClientOption.ASYNC with value true, the problem disappear. Perhaps this is a clue. @chernser
code like:
private Client buildClient(String endPoint, String user, String password) {
return new Client.Builder()
.addEndpoint(endPoint)
.setUsername(user)
.setPassword(password)
.setOption(ClickHouseClientOption.ASYNC.getKey(), "true")
.setMaxConnections(10)
.setLZ4UncompressedBufferSize(1058576)
.setSocketRcvbuf(500_000)
.setSocketTcpNodelay(true)
.setSocketSndbuf(500_000)
.setSocketTimeout(TimeUnit.MINUTES.toMillis(5))
.setClientNetworkBufferSize(500_000)
.allowBinaryReaderToReuseBuffers(true) // using buffer pool for binary reader
.build();
}
Good day, @zalj ! Thank you for providing the information! I think I know where is the problem. As for ASYNC operations - I think, client should be used in blocking way, because application knows better how to spawn and manage threads. Client in this case will just spawn extra threads but it will not give any performance benefits because IO is blocking internally.
I encountered a similar problem using clickhouse-r2dbc:0.8.6 HTTP request failed org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 180000 MILLISECONDS, actual: 180000 MILLISECONDS
I noticed that connections started leaking
there was approximately the following logic databaseClient.sql("select count(*) from my_table") ... .next()
I replaced .next() with .collectList().mapNotNull(CollectionUtils::firstElement) and everything worked for me, connections do not leak. Maybe it will help someone
Hi @chernser, we're on com.clickhouse:client-v2:0.9.2 release which is supposed to contain fixes for this issue, however we're facing the same problem after 12 requests + 4 retries (16 total requests):
25-09-18 07:38:02.040 [DefaultDispatcher-worker-1] WARN c.c.client.api.Client - [] Retrying.
org.apache.hc.core5.http.ConnectionRequestTimeoutException: Timeout deadline: 10000 MILLISECONDS, actual: 10006 MILLISECONDS
at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.acquireEndpoint(InternalExecRuntime.java:122)
at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:127)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:150)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:113)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:110)
at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:183)
at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:87)
at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.hc.client5.http.classic.HttpClient.executeOpen(HttpClient.java:183)
at com.clickhouse.client.api.internal.HttpAPIClientHelper.executeRequest(HttpAPIClientHelper.java:450)
at com.clickhouse.client.api.Client.lambda$query$5(Client.java:1593)
at com.clickhouse.client.api.Client.runAsyncOperation(Client.java:2007)
at com.clickhouse.client.api.Client.query(Client.java:1637)
at com.clickhouse.client.api.Client.query(Client.java:1536)
…
25-09-18 07:38:02.041 [DefaultDispatcher-worker-1] WARN c.c.client.api.Client - [] Query request failed after attempts: 4 - Duration: 40015538309
I'm afraid the issue needs to be reopened.
Update: I was NOT calling close on client.query() and client.insert() response objects, in my case it was happening during schema initialization as soon as we got 16+ queries — everything started to freeze:
logger.info("init: $className...")
withTimeout(initSchemaTimeout) {
val initQuerySettings = QuerySettings().also {
it.setNetworkTimeout(initSchemaTimeout.inWholeMilliseconds, MILLIS)
it.setMaxExecutionTime(initSchemaTimeout.inWholeSeconds.toInt())
}
tryWithExponentialBackoff {
// ClickHouse requires each SQL statement (separated by ';') to be queried separately.
initSqlStatements.forEach { initSqlStatement ->
val duration = measureTime {
logger.info("init: executing statement {}", initSqlStatement)
client
.query(initSqlStatement, initQuerySettings)
.await()
.close() // CRUCIAL!!!
}
logger.info("init: statement took {}", duration)
}
}
}
The thing is — it's not at all obvious that you have to close() returned objects if you don't even use them. This needs to be pointed out in README, even repo-own code examples don't do it:
https://github.com/ClickHouse/clickhouse-java/blob/87db1475b0211818662ace754c60ed705aee1314/examples/client-v2/src/main/java/com/clickhouse/examples/client_v2/POJO2DbWriter.java#L93-L95
I'm not sure how anyone should magically know this 😕
Good day, @artem-zinnatullin
I understand your pain.
But we may not close response before returning it because it will close socket stream and connection will be release.
Other API methods like queryAll() will close response because they read out result.
In documentation we are telling about closing QueryResponse: https://clickhouse.com/docs/integrations/language-clients/java/client#query-api
Besides QueryResponse, InsertResponse and CommandResponse implement AutoCloseable.
I will correct the example.
@chernser thanks for looking into this
I'd suggest that all methods returning AutoCloseables to be annotated with JSR-305 @CheckReturnValue and have javadoc that clearly explains what will happen if it's not closed (ie network pool can get stuck)
https://errorprone.info/bugpattern/CheckReturnValue
On my end I now found the notes in the website doc:
This object should be closed as soon as possible to release a connection because the connection cannot be re-used until all data of previous response is fully read.
Not sure why repo and website have different examples/docs: https://github.com/ClickHouse/clickhouse-java?tab=readme-ov-file#examples vs https://clickhouse.com/docs/integrations/language-clients/java/client
Javadoc on the other hand doesn't say anything about closing the return objects from query methods: