blaze icon indicating copy to clipboard operation
blaze copied to clipboard

WritePendingException: null from client

Open rossabaker opened this issue 7 years ago • 13 comments

I can sometimes hit this evalMapping a basic GET request 20,000 times.

2018-11-15 02:30:44 ERROR [scala-execution-context-global-346] [] [Http1Connection.scala:153] - Error rendering request
java.nio.channels.WritePendingException: null
    at sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:352)
    at sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:416)
    at org.http4s.blaze.channel.nio2.ByteBufferHead.org$http4s$blaze$channel$nio2$ByteBufferHead$$go$1(ByteBufferHead.scala:42)
    at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:58)
    at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:25)
    at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:15)
    at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:96)
    at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:94)
    at org.http4s.client.blaze.ReadBufferStage.channelWrite(ReadBufferStage.scala:13)
    at org.http4s.client.blaze.ReadBufferStage.writeRequest(ReadBufferStage.scala:20)
    at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:96)
    at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:94)
    at org.http4s.blazecore.ResponseHeaderTimeoutStage.channelWrite(ResponseHeaderTimeoutStage.scala:13)
    at org.http4s.blazecore.ResponseHeaderTimeoutStage.writeRequest(ResponseHeaderTimeoutStage.scala:39)
    at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:96)
    at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:94)
    at org.http4s.client.blaze.Http1Connection.channelWrite(Http1Connection.scala:24)
    at org.http4s.blazecore.util.CachingChunkWriter.doWriteEnd(CachingChunkWriter.scala:74)
    at org.http4s.blazecore.util.CachingChunkWriter.writeEnd(CachingChunkWriter.scala:55)

/cc @ngbinh, who also reported it

rossabaker avatar Nov 15 '18 14:11 rossabaker

In our case, it happens when we run our integ suites (probably hard to reproduce)

ngbinh avatar Nov 16 '18 00:11 ngbinh

I have a pretty easy reproduction on this one. Not one I'm sure I want to run on Travis, but it shows the bug.

rossabaker avatar Nov 16 '18 00:11 rossabaker

That's great

ngbinh avatar Nov 16 '18 01:11 ngbinh

I have confirmed that this is not the timeout stages. I stripped them all away and it still happens. I now suspect that it is related to the simultaneous rendering of the request with receiving of the response. I think we need to figure out how to formally cancel the request writer and await its cancellation before recycling.

rossabaker avatar Nov 19 '18 15:11 rossabaker

I observe this with a bit of high traffic using websockets using 0.21.0-RC1. Was there ever a fix?

cquiroz avatar Jan 22 '20 14:01 cquiroz

Not yet.

rossabaker avatar Jan 22 '20 14:01 rossabaker

Hi,

I am facing the same issue. Here is my stack trace (sorry I cannot do more on it right now :/ ):

http4s version: 0.21.0

20/04/16 06:10:50 ERROR blaze.Http1Connection: Error rendering request
java.nio.channels.WritePendingException
	at sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:352)
	at sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:416)
	at org.http4s.blaze.channel.nio2.ByteBufferHead.org$http4s$blaze$channel$nio2$ByteBufferHead$$go$1(ByteBufferHead.scala:42)
	at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:58)
	at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:109)
	at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:107)
	at org.http4s.client.blaze.ReadBufferStage.channelWrite(ReadBufferStage.scala:13)
	at org.http4s.client.blaze.ReadBufferStage.writeRequest(ReadBufferStage.scala:21)
	at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:109)
	at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:107)
	at org.http4s.blazecore.IdleTimeoutStage.channelWrite(IdleTimeoutStage.scala:12)
	at org.http4s.blazecore.IdleTimeoutStage.writeRequest(IdleTimeoutStage.scala:46)
	at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:109)
	at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:107)
	at org.http4s.client.blaze.Http1Connection.channelWrite(Http1Connection.scala:25)
	at org.http4s.blazecore.util.IdentityWriter.writeBodyChunk(IdentityWriter.scala:56)
	at org.http4s.blazecore.util.EntityBodyWriter.$anonfun$writePipe$3(EntityBodyWriter.scala:64)
	at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:87)
	at cats.effect.internals.IORunLoop$.startCancelable(IORunLoop.scala:41)
	at cats.effect.internals.IOBracket$BracketStart.run(IOBracket.scala:86)
	at cats.effect.internals.Trampoline.cats$effect$internals$Trampoline$$immediateLoop(Trampoline.scala:70)
	at cats.effect.internals.Trampoline.startLoop(Trampoline.scala:36)
	at cats.effect.internals.TrampolineEC$JVMTrampoline.super$startLoop(TrampolineEC.scala:93)
	at cats.effect.internals.TrampolineEC$JVMTrampoline.$anonfun$startLoop$1(TrampolineEC.scala:93)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
	at cats.effect.internals.TrampolineEC$JVMTrampoline.startLoop(TrampolineEC.scala:93)
	at cats.effect.internals.Trampoline.execute(Trampoline.scala:43)
	at cats.effect.internals.TrampolineEC.execute(TrampolineEC.scala:44)
	at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:72)
	at cats.effect.internals.IOBracket$BracketStart.apply(IOBracket.scala:52)
	at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:136)
	at cats.effect.internals.IORunLoop$.start(IORunLoop.scala:34)
	at cats.effect.internals.IOBracket$.$anonfun$apply$1(IOBracket.scala:44)
	at cats.effect.internals.IOBracket$.$anonfun$apply$1$adapted(IOBracket.scala:34)
	at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:341)
	at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:119)
	at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:355)
	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:376)
	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:316)
	at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
	at cats.effect.internals.PoolUtils$$anon$2$$anon$3.run(PoolUtils.scala:51)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

kczulko avatar Apr 16 '20 07:04 kczulko

Can anyone explain the consequence of this exception getting logged? Does it cancel one or more running requests? The exception doesn't seem to bubble up so does that mean that an internal retry mechanism is catching it?

caoilte avatar May 25 '20 09:05 caoilte

A fiber is started to write the request, and this message is logged when that fiber fails.

It's an interesting question: my inclination is that we should instead be canceling the request and raising an error. But the response to a partial request rendering might be part of the recovery. It's natural that the server can begin responding to a partially written request, and bad things can always happen when writing a request, even after fixing this bug. How do we best represent "this is the response we managed to get from an incorrectly rendered request?" I think the status quo is wrong, but I'm not sure what's right.

rossabaker avatar Jun 06 '20 04:06 rossabaker

Hello. I ran into the same issue. For some reason, I wasn't able to reproduce error locally, while in the production setup it appears constantly. Perhaps networking or platform (Linux) is a part of the problem.

Prerequisites

http4s version: 0.21.4 pool size: 1 idle timeout: Duration.Inf

Successful execution flow

[2020-06-22 19:39:36,806] DEBUG [PoolManager]: Requesting connection: curAllocated=1 idleQueues.size=1 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,806] DEBUG [PoolManager]: Recycling connection: curAllocated=1 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,806] DEBUG [Http1Connection]: Connection was idle. Running.
[2020-06-22 19:39:36,806] DEBUG [Http1Connection]: Beginning request: POST http://dns:8080/api/v1/datapoints
[2020-06-22 19:39:36,809] DEBUG [Http1Connection]: Resetting org.http4s.client.blaze.Http1Connection after completing request.
[2020-06-22 19:39:36,809] DEBUG [PoolManager]: Recycling connection: curAllocated=1 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,809] DEBUG [PoolManager]: Returning idle connection to pool: curAllocated=1 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false

Failed execution flow

[2020-06-22 19:39:36,811] DEBUG [PoolManager]: Requesting connection: curAllocated=1 idleQueues.size=1 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,811] DEBUG [PoolManager]: Recycling connection: curAllocated=1 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2020-06-22 19:39:36,811] DEBUG [Http1Connection]: Connection was idle. Running.
[2020-06-22 19:39:36,811] DEBUG [Http1Connection]: Beginning request: POST http://dns:8080/api/v1/datapoints
[2020-06-22 19:39:36,814] DEBUG [Http1Connection]: Resetting org.http4s.client.blaze.Http1Connection after completing request.
[2020-06-22 19:39:36,814] ERROR [Http1Connection]: Error rendering request

Exception

java.nio.channels.WritePendingException: null
	at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:353)
	at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:417)
	at org.http4s.blaze.channel.nio2.ByteBufferHead.org$http4s$blaze$channel$nio2$ByteBufferHead$$go$1(ByteBufferHead.scala:42)
	at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:58)
	at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:25)
	at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:16)
	at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:96)
	at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:94)
	at org.http4s.client.blaze.ReadBufferStage.channelWrite(ReadBufferStage.scala:14)
	at org.http4s.client.blaze.ReadBufferStage.writeRequest(ReadBufferStage.scala:20)
	at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:96)
	at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:94)
	at org.http4s.client.blaze.Http1Connection.channelWrite(Http1Connection.scala:25)
	at org.http4s.blazecore.util.IdentityWriter.writeBodyChunk(IdentityWriter.scala:57)
	at org.http4s.blazecore.util.IdentityWriter.writeEnd(IdentityWriter.scala:64)
	at org.http4s.blazecore.util.EntityBodyWriter.$anonfun$writeEntityBody$1(EntityBodyWriter.scala:53)

The most interesting part

[2020-06-22 19:39:36,814] DEBUG [Http1Connection]: Resetting org.http4s.client.blaze.Http1Connection after completing request.
[2020-06-22 19:39:36,814] ERROR [Http1Connection]: Error rendering request

At this point, I cannot understand the logic behind these messages. Resetting ... after completing request. printed when reading stage (response) is completed. While WritePendingException thrown at the writing stage (request).

Can the writing stage be active along with the reading stage?

iRevive avatar Jun 22 '20 20:06 iRevive

@iRevive the problem went away completely once I rate-limited our use of blaze HTTP client. We would see it when the node was doing > 250 requests / second. Once I rate limited the service to 200 requests / second we stopped seeing the problem completely. (I used https://github.com/SystemFw/upperbound to do the rate limiting.)

caoilte avatar Jun 30 '20 15:06 caoilte

Not sure how useful all of this will be... In one service we have a periodic process that queries a few other services once an hour. In particular, it sends two consecutive GET queries (over plain HTTP) to a service that doesn't see much action otherwise. On one of the instances (with both of the services running on Windows) we're seeing something like this around 80% of the time:

  1. java.nio.channels.WritePendingException: null in the log.
  2. 30 seconds (the request timeout) pass.
  3. We see the results of the requests. The first one has succeeded, the second one has timed out.

I've failed to reproduce this locally or get any extra data (I don't control this instance), so I'm at a loss. I can't understand why the timeout happens (as opposed to a retry), and why this only happens between these two services - even though there are more places with pretty much the same flow.

UPD: I've since seen this issue with other services. Also, the timeout behavior might be due to http4s/blaze#642. As a workaround, we've just switched to another client (Jetty).

nigredo-tori avatar Nov 05 '20 04:11 nigredo-tori

Any news on this? I'm also seeing a relation between the frequency of this error occurring and the speed at which I am making the requests. Higher performance, more prone to throw error.

kovacshuni avatar Dec 12 '20 01:12 kovacshuni