Disconnect error not detected
Current disconnect handling implementation in ChannelHead relies on matching IOException.getMessage. This is problematic - for example, on Windows these messages are localized, which leads to junk log messages:
2019-05-31 10:35:21,048 WARN: Channel IOException not known to be a disconnect error //[Thread-12] o.h.blaze.channel.nio2.ByteBufferHead.checkError() L:18
java.io.IOException: Удаленный хост принудительно разорвал существующее подключение
Shouldn't there be a better way to detect such situations? What do other libraries do?
That's a great question. I imagine this is a solved problem in netty. It's hard when the implementation doesn't throw anything more specific than IOException. I'd love to find a better solution.
We could add the Russian version to solve your problem, but we'll never keep up in general until a more robust alternative is identified.
This is way better than we do, but still not great: https://github.com/netty/netty/blob/f17bfd0f64189d91302fbdd15103788bf9eabaa2/handler/src/main/java/io/netty/handler/ssl/SslHandler.java#L1100
I've seen a similar situation recently, only this time the error ("An established connection was aborted by the software in your host machine") wasn't even in the broken pipe messages... I'm not even sure we need this log message anymore. In the past it might have made sense since there was something different happening in this branch, but right now it's EOF for all IOExceptions, so why bother?
I think that's fair. If it's not surfacing any that we'd like to treat differently, there really isn't a point.
@rossabaker So is this error depends on error? I got this error:
2021-03-16 17:51:23,653 [WARN] - Channel IOException not known to be a disconnect error
java.io.IOException: Обрыв канала
at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:182)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:123)
at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.implWrite(UnixAsynchronousSocketChannelImpl.java:699)
at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:383)
at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:417)
at org.http4s.blaze.channel.nio2.ByteBufferHead.org$http4s$blaze$channel$nio2$ByteBufferHead$$go$1(ByteBufferHead.scala:58)
at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:74)
at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:124)
at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:122)
at org.http4s.client.blaze.ReadBufferStage.channelWrite(ReadBufferStage.scala:30)
at org.http4s.client.blaze.ReadBufferStage.writeRequest(ReadBufferStage.scala:38)
at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:124)
at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:122)
at org.http4s.blazecore.IdleTimeoutStage.channelWrite(IdleTimeoutStage.scala:28)
at org.http4s.blazecore.IdleTimeoutStage.writeRequest(IdleTimeoutStage.scala:62)
at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:124)
at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:122)
at org.http4s.client.blaze.Http1Connection.channelWrite(Http1Connection.scala:41)
at org.http4s.blazecore.util.IdentityWriter.writeBodyChunk(IdentityWriter.scala:72)
at org.http4s.blazecore.util.EntityBodyWriter.$anonfun$writePipe$3(EntityBodyWriter.scala:74)
at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:104)
at cats.effect.internals.IORunLoop$.restartCancelable(IORunLoop.scala:51)
at cats.effect.internals.IOBracket$BracketStart.run(IOBracket.scala:100)
at cats.effect.internals.Trampoline.cats$effect$internals$Trampoline$$immediateLoop(Trampoline.scala:67)
at cats.effect.internals.Trampoline.startLoop(Trampoline.scala:35)
at cats.effect.internals.TrampolineEC$JVMTrampoline.super$startLoop(TrampolineEC.scala:90)
at cats.effect.internals.TrampolineEC$JVMTrampoline.$anonfun$startLoop$1(TrampolineEC.scala:90)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94)
at cats.effect.internals.TrampolineEC$JVMTrampoline.startLoop(TrampolineEC.scala:90)
at cats.effect.internals.Trampoline.execute(Trampoline.scala:43)
at cats.effect.internals.TrampolineEC.execute(TrampolineEC.scala:42)
at cats.effect.internals.IOBracket$.$anonfun$guaranteeCase$1(IOBracket.scala:116)
at cats.effect.internals.IOBracket$.$anonfun$guaranteeCase$1$adapted(IOBracket.scala:114)
at cats.effect.internals.IOAsync$.$anonfun$apply$1(IOAsync.scala:37)
at cats.effect.internals.IOAsync$.$anonfun$apply$1$adapted(IOAsync.scala:37)
at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:463)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:484)
at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:422)
at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
at cats.effect.internals.PoolUtils$$anon$2$$anon$3.run(PoolUtils.scala:52)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
I guess it happens somewhere here:
val http = new Http(config, (_: util.Map[String, util.List[String]], _: util.Map[String, String]) => ())
retryingOnAllErrors(policy, logOnError){
http
.post(url)
.authentication()
.client()
.form()
.param("client_id", cfg.clientName)
.response()
.execute()
}
Also retries seems to stop working when this error happens.
Would it be good if message was in English: java.io.IOException: Channel is broken?
Обрыв канала
I'd say that this translates to "Broken pipe", which would be caught by the condition here. So the message you've posted is, indeed, another instance of this issue.
I thought that my troubles might be because of that but now i see it ends in EOF anyway. Thanks
It sounds like we should just go ahead and get rid of that log message, and do so on the series/0.14 branch?
Hi,
Not sure if that's related but I'm having another problem - EOF is hard to debug without any stacktrace/additional message attached (blaze-core v0.14.17).
For example, we're having a big app that makes calls to multiple external services with the client. And common .handleErrorWith in the main that logs errors. What I see at the end when something goes wrong:
Integration failed: EOF
And nothing else. I don't even know at which step it failed... Is there a way to make it better (for example by attaching info about the request)? And/or attach stacktrace to it?
The EOF is a regrettable design, because it's both a command an an exception. It suppresses the stack trace because the exception is used for control flow and is expected to happen a lot.
We might be able to enhance the request loop in BlazeClient to wrap the EOF in an exception that provides request context. :thinking: