oci-java-sdk icon indicating copy to clipboard operation
oci-java-sdk copied to clipboard

SDK will deadlock for no apparent reason.

Open e3ndr opened this issue 1 year ago • 9 comments

Hi there! There appears to be a deadlock in the Oracle SDK that causes it to hang indefinitely. Any and all requests will block forever. Nothing will break it out of this, and interrupting the thread doesn't fix the underlying issue. I have to fully kill the application and restart it.

There are no logs, and the only reproduction I have is to use the containerinstances SDK for a couple of days.

e3ndr avatar Apr 27 '24 03:04 e3ndr

Here's my (abridged) dependency pom:

	<dependencyManagement>
		<dependencies>
			<dependency>
				<groupId>com.oracle.oci.sdk</groupId>
				<artifactId>oci-java-sdk-bom</artifactId>
				<version>3.39.0</version>
				<type>pom</type>
				<scope>import</scope>
			</dependency>
		</dependencies>
	</dependencyManagement>

	<dependencies>
		<dependency>
			<groupId>com.oracle.oci.sdk</groupId>
			<artifactId>oci-java-sdk-common-httpclient-jersey</artifactId>
			<scope>compile</scope>
		</dependency>
		<dependency>
			<groupId>com.oracle.oci.sdk</groupId>
			<artifactId>oci-java-sdk-core</artifactId>
			<scope>compile</scope>
		</dependency>
		<dependency>
			<groupId>com.oracle.oci.sdk</groupId>
			<artifactId>oci-java-sdk-containerinstances</artifactId>
			<scope>compile</scope>
		</dependency>
	</dependencies>

e3ndr avatar Apr 27 '24 03:04 e3ndr

I am able to avoid this issue by disabling the Apache Http Connector:

            .clientConfigurator(builder -> {
                builder.property(JerseyClientProperties.USE_APACHE_CONNECTOR, false);
            })

e3ndr avatar May 11 '24 06:05 e3ndr

Hi, If you are using any OCI Java SDK version(s) >= 3.31.0 and <= 3.38.0 then you could see a thread leak in IdleConnectionMonitor. We recommend you to update to version 3.39.0 or later. More info here - https://github.com/oracle/oci-java-sdk/issues/587

rkumarpa avatar May 13 '24 13:05 rkumarpa

I am already using 3.39.0. I believe this is a separate issue from 587 :)

e3ndr avatar May 15 '24 03:05 e3ndr

Can you please share the stack trace? Also, which Jersey version are you using?

rkumarpa avatar May 17 '24 13:05 rkumarpa

One thing you could try is setting the system property oci.javasdk.apache.idle.connection.monitor.thread.enabled=false or setting the property in clientConfigurator

.clientConfigurator(builder -> {
                    builder.property(JerseyClientProperties.APACHE_IDLE_CONNECTION_MONITOR_THREAD_ENABLED, false);
                    })

Please let me know if it works after setting this property.

rkumarpa avatar May 17 '24 13:05 rkumarpa

Also, if are you using inputstreams, you need to close all input streams obtained from the response object. See here - https://github.com/oracle/oci-java-sdk/blob/master/ApacheConnector-README.md#program-hangs-for-an-indefinite-time

rkumarpa avatar May 17 '24 17:05 rkumarpa

Can you please share the stack trace? Also, which Jersey version are you using?

I had setup my own watchdog to kill connections after a few seconds (or minutes, for some other calls). It calls Thread#interrupt() when the task takes too long. I somehow managed to lose the stacktrace from when I was still using the Apache connector, but the stuck task is definitely in the Oracle SDK.

3.39.0 :)

One thing you could try is setting the system property oci.javasdk.apache.idle.connection.monitor.thread.enabled=false or setting the property in clientConfigurator

.clientConfigurator(builder -> {
                    builder.property(JerseyClientProperties.APACHE_IDLE_CONNECTION_MONITOR_THREAD_ENABLED, false);
                    })

Please let me know if it works after setting this property.

Unfortunately I can't test this, but it does appear to be an issue with the Apache connector.

Also, if are you using inputstreams, you need to close all input streams obtained from the response object. See here - https://github.com/oracle/oci-java-sdk/blob/master/ApacheConnector-README.md#program-hangs-for-an-indefinite-time

I am not. Just simple GetContainerStatus and CreateContainerInstance calls which do not appear to have a closeable response.

Sorry that I can't get the testing in, I don't really have the budget to throw at reproduction and I also can't break the production services that rely on this 😛

e3ndr avatar May 17 '24 23:05 e3ndr

.clientConfigurator(builder -> {
                builder.property(JerseyClientProperties.USE_APACHE_CONNECTOR, false);
            })

Has it been resolved now?I am already using 3.39.0. too :)Is this the only way to do it?USE_APACHE_CONNECTOR, false

balckduck avatar Aug 08 '24 07:08 balckduck