cf-java-client icon indicating copy to clipboard operation
cf-java-client copied to clipboard

OutOfMemory/Slow performance upon service creation/deletion

Open martin-d-aleksandrov opened this issue 4 years ago • 4 comments

Hi Friends,

we recently noticed a strange behaviour when creating/deleting service instances using cf-java-client that leads to OutOfMemoryException and really slow performance. The issue occurs upon unsuccessful service creation/deletion, when it takes about 20 seconds to build the underlying reactor stacktrace. We confirmed that the stacktrace building is the time consuming part of the execution via heapdumps, threadumps and profiling of the app. The heapdump shows the impressive 150MB char array containing the error message and stacktrace. The truncated stacktrace has the following structure:

reactor.core.publisher.Sinks$EmissionException: Spec. Rule 1.3 - onSubscribe, onNext, onError and onComplete signaled to a Subscriber MUST be signaled serially.
	at reactor.core.publisher.InternalManySink.emitNext(InternalManySink.java:56)
	Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Assembly trace from producer [reactor.core.publisher.MonoCacheTime] :
	reactor.core.publisher.Mono.checkpoint(Mono.java:1902)
	org.cloudfoundry.reactor.tokenprovider.AbstractUaaTokenProvider.token(AbstractUaaTokenProvider.java:261)
Error has been observed at the following site(s):
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.tokenprovider.AbstractUaaTokenProvider.token(AbstractUaaTokenProvider.java:261)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.client.v2.spaces.ReactorSpaces.listServices(ReactorSpaces.java:232)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.operations.services.DefaultServices.createInstance(DefaultServices.java:176)
	|_____ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.tokenprovider.AbstractUaaTokenProvider.token(AbstractUaaTokenProvider.java:261)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.client.v2.spaces.ReactorSpaces.listServices(ReactorSpaces.java:232)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.operations.services.DefaultServices.createInstance(DefaultServices.java:176)
	|_________ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.tokenprovider.AbstractUaaTokenProvider.token(AbstractUaaTokenProvider.java:261)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.client.v2.spaces.ReactorSpaces.listServices(ReactorSpaces.java:232)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.operations.services.DefaultServices.createInstance(DefaultServices.java:176)
	|_____________ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.tokenprovider.AbstractUaaTokenProvider.token(AbstractUaaTokenProvider.java:261)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.client.v2.spaces.ReactorSpaces.listServiceInstances(ReactorSpaces.java:226)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.operations.services.DefaultServices.deleteInstance(DefaultServices.java:215)
	|_________________ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.tokenprovider.AbstractUaaTokenProvider.token(AbstractUaaTokenProvider.java:261)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.reactor.client.v2.spaces.ReactorSpaces.listServiceInstances(ReactorSpaces.java:226)
	|_ Mono.checkpoint ⇢ at org.cloudfoundry.operations.services.DefaultServices.deleteInstance(DefaultServices.java:215)

         ... <much more data that look identically (a couple of tenths of thousands of lines>

Stack trace:
		at reactor.core.publisher.InternalManySink.emitNext(InternalManySink.java:56)
		at org.cloudfoundry.reactor.tokenprovider.AbstractUaaTokenProvider.lambda$null$3(AbstractUaaTokenProvider.java:209)
		at java.util.Optional.ifPresent(Optional.java:159)
		at org.cloudfoundry.reactor.tokenprovider.AbstractUaaTokenProvider.lambda$extractRefreshToken$4(AbstractUaaTokenProvider.java:198)
		at reactor.core.publisher.FluxPeek$PeekSubscriber.onNext(FluxPeek.java:184)
		at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:120)
		at reactor.core.publisher.FluxMap$MapSubscriber.onNext(FluxMap.java:120)
		at reactor.core.publisher.FluxHandle$HandleSubscriber.onNext(FluxHandle.java:118)
		at reactor.core.publisher.FluxMap$MapConditionalSubscriber.onNext(FluxMap.java:220)
		at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onNext(FluxDoFinally.java:130)
		at reactor.core.publisher.FluxHandleFuseable$HandleFuseableSubscriber.onNext(FluxHandleFuseable.java:184)
		at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onNext(FluxContextWrite.java:107)
		at reactor.core.publisher.Operators$MonoSubscriber.complete(Operators.java:1789)
		at reactor.core.publisher.MonoCollectList$MonoCollectListSubscriber.onComplete(MonoCollectList.java:128)
		at reactor.core.publisher.FluxPeek$PeekSubscriber.onComplete(FluxPeek.java:259)
		at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:142)
		at reactor.core.publisher.FluxDoFinally$DoFinallySubscriber.onComplete(FluxDoFinally.java:145)
		at reactor.core.publisher.FluxMap$MapSubscriber.onComplete(FluxMap.java:142)
		at reactor.netty.channel.FluxReceive.onInboundComplete(FluxReceive.java:389)
		at reactor.netty.channel.ChannelOperations.onInboundComplete(ChannelOperations.java:396)
		at reactor.netty.channel.ChannelOperations.terminate(ChannelOperations.java:452)
		at reactor.netty.http.client.HttpClientOperations.onInboundNext(HttpClientOperations.java:664)
		at reactor.netty.channel.ChannelOperationsHandler.channelRead(ChannelOperationsHandler.java:94)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
		at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
		at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
		at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
		at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
		at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
		at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
		at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
		at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
		at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1533)
		at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1282)
		at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1329)
		at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:508)
		at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:447)
		at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
		at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
		at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
		at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
		at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
		at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
		at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480)
		at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
		at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
		at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
		at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
		at java.lang.Thread.run(Thread.java:836)

Suppressed: java.lang.Exception: #block terminated with an error
		at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:99)
		at reactor.core.publisher.Mono.block(Mono.java:1703)
                 ... <more suppressed exceptions (about 4000) that are exactly the same>

Digging into the cf-java-client/reactor code, we noticed that the reactor provides a functionality for creating a light checkpoint that has no the overhead of stacktrace building upon error, but the wrapping cf-java-client does not expose that. Do you plan to expose such functionality? Or have you observed similar behaviour, any idea how we could overcome this?

We use version 5.1.0 of cf-java-client.

Regards, Martin

PS. Having a quick overview of the reported issues, I noticed this one - https://github.com/cloudfoundry/cf-java-client/issues/1090 - which might be related to this problem, as well.

martin-d-aleksandrov avatar Mar 31 '21 12:03 martin-d-aleksandrov

Thanks for this report!

Do you have a code snippet or example you could provide that would generate one of these large checkpoint traces? I'm not doubting you at all. It would just speed the investigation of this if you could attach a sample to reproduce the problem.

Aside from that, I will keep this on the roadmap to investigate and see what we can do to improve the situation here.

dmikusa avatar Jun 04 '21 15:06 dmikusa

Hello, yeah, you're right and it would be pretty useful to have a sample app to reproduce that issue (I realize it is pretty hard to reproduce such kind of issues). Unfortunately, I was not able to simulate it in a stable manner, so that it is reproducible via a sample app.

What I could add (that hopefully would help you investigating this) is that the issue start to happen after we updated the version of the cf-java-client, and it was gone when we rolled it back. Currently we use version 3.27.0 and we haven't observed occurrences of the issue. I tried to traverse the diffs between 3.27.0 and 5.1.0, but it was rather impossible to do that for one that it not into the project :(.

Regards, Martin

martin-d-aleksandrov avatar Jun 11 '21 08:06 martin-d-aleksandrov

OK, so 3.x is pretty old at this point. Are you seeing the issue in 3.x? If so, I'm not sure how much I can help there. I would say upgrade to 5.x and see if the problem persists.

If you are seeing the issue when you go to 5.x, then I have a few more questions. What is the specific version of 5.x that you are using? If it is not the most recent, please retry with the most recent. Also, what version of Spring Boot are you using (if you are using Spring Boot)? If you are not using Spring Boot, can you attach ./mvnw dependency:tree or similar output for Gradle? I'll need to see a full dependency listing.

The 5.x line is tested with Spring Boot 2.4, so the dependencies that are required to use cf-java-client are curated and tested with the dependencies you'd get with Spring Boot 2.4. You would want to make sure that you are using Spring Boot 2.4, or an equivalent set of dependencies (if not using Spring Boot). If you do not, then there can be odd problems. We want to eliminate any dependency issues before digging into this more. Thanks.

dmikusa avatar Jun 11 '21 14:06 dmikusa

Yeah, it is true that 3.x is pretty old one, but it is actually the one that we do NOT observe the issue with. That's why we rolled back to 3.x and upgraded to the last of 3.x version (it was 3.27 back then). On the other questions:

  • The issue was reproduced with 5.1.0 (it was the most recent version back then)
  • We are not using Spring Boot. I am attaching the mvn dependency:tree output. I don't see any Spring related dependencies, besides the transitive ones that come after cloudfoundry-client-reactor.

martin-d-aleksandrov avatar Jun 14 '21 08:06 martin-d-aleksandrov