jersey icon indicating copy to clipboard operation
jersey copied to clipboard

Jersey threads stuck in `WAITING`

Open joekarl opened this issue 7 years ago • 4 comments

As part of the upgrade path from Dropwizard 0.x -> 1.x the jersey version was bumped to 2.25.1

What we've noticed is that across a number of our production services, we're seeing threads randomly moving from a RUNNABLE to a WAITING state and staying in that waiting state. This seems to occur randomly, but mostly centered around higher throughput paths that can sometimes incur higher latency. As these blocked threads increase over time, the thread pool is eventually starved and the service topples over.

This seems somewhat related to this previous issue (targeted for 2.26+ milestone) https://github.com/jersey/jersey/issues/3207

Here's an example from a recent thread dump. We see something similar for each stuck thread and the entries in the thread dump slowly increase over time as they're not evicted from the thread pool.

"dw-2113 - GET /path/being/called..." id=2113 state=WAITING
    - waiting on <0x1f2f7a6b> (a jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync)
    - locked <0x1f2f7a6b> (a jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync)
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
    at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
    at org.glassfish.jersey.servlet.internal.ResponseWriter.getResponseContext(ResponseWriter.java:299)
    at org.glassfish.jersey.servlet.internal.ResponseWriter.callSendError(ResponseWriter.java:215)
    at org.glassfish.jersey.servlet.internal.ResponseWriter.commit(ResponseWriter.java:194)
    at org.glassfish.jersey.server.ContainerResponse.close(ContainerResponse.java:413)
    at org.glassfish.jersey.server.ServerRuntime$Responder.writeResponse(ServerRuntime.java:784)
    at org.glassfish.jersey.server.ServerRuntime$Responder.processResponse(ServerRuntime.java:444)
    at org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:490)
    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:334)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
    at io.dropwizard.jetty.NonblockingServletHolder.handle(NonblockingServletHolder.java:49)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1650)
    at io.dropwizard.servlets.ThreadNameFilter.doFilter(ThreadNameFilter.java:34)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
    at io.dropwizard.jersey.filter.AllowedMethodsFilter.handle(AllowedMethodsFilter.java:45)
    at io.dropwizard.jersey.filter.AllowedMethodsFilter.doFilter(AllowedMethodsFilter.java:39)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
    at com.amazonaws.xray.javax.servlet.AWSXRayServletFilter.doFilter(AWSXRayServletFilter.java:151)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:166)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at com.codahale.metrics.jetty9.InstrumentedHandler.handle(InstrumentedHandler.java:241)
    at io.dropwizard.jetty.RoutingHandler.handle(RoutingHandler.java:52)
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:527)
    at io.dropwizard.jetty.BiDiGzipHandler.handle(BiDiGzipHandler.java:68)
    at org.eclipse.jetty.server.handler.RequestLogHandler.handle(RequestLogHandler.java:56)
    at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:561)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:334)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:279)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:104)
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:289)
    at org.eclipse.jetty.io.ssl.SslConnection$3.succeeded(SslConnection.java:149)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:104)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:124)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:247)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:243)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:679)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:597)
    at java.lang.Thread.run(Thread.java:748)

joekarl avatar Apr 05 '18 15:04 joekarl

And to be clear, all of these requests are synchronous. They do reach out to other network resources (other services/database/etc...) but those are generally done synchronously within the request thread (sometimes we'll spawn futures internally to fetch resources in parallel, but we resolve those futures synchronously in the request thread).

joekarl avatar Apr 05 '18 15:04 joekarl

Could this be related to #3772 ?

daniloradenovic avatar Apr 06 '18 13:04 daniloradenovic

We've discovered that while we can't seem to figure out why the threads never return, we can at least forcefully clean them up using Jetty's blockingTimeout config option http://download.eclipse.org/jetty/stable-9/apidocs/org/eclipse/jetty/server/HttpConfiguration.html#getBlockingTimeout--

Though seems more of a bandaid than an actual fix.

joekarl avatar Apr 06 '18 15:04 joekarl

Reporting back, the blockingTimeout setting didn't actually help the situation, still seeing threads getting in a stuck state.

joekarl avatar Apr 06 '18 17:04 joekarl