blaze icon indicating copy to clipboard operation
blaze copied to clipboard

Disconnect error not detected

Open nigredo-tori opened this issue 6 years ago • 10 comments

Current disconnect handling implementation in ChannelHead relies on matching IOException.getMessage. This is problematic - for example, on Windows these messages are localized, which leads to junk log messages:

2019-05-31 10:35:21,048  WARN: Channel IOException not known to be a disconnect error //[Thread-12] o.h.blaze.channel.nio2.ByteBufferHead.checkError() L:18
java.io.IOException: Удаленный хост принудительно разорвал существующее подключение

Shouldn't there be a better way to detect such situations? What do other libraries do?

nigredo-tori avatar May 31 '19 04:05 nigredo-tori

That's a great question. I imagine this is a solved problem in netty. It's hard when the implementation doesn't throw anything more specific than IOException. I'd love to find a better solution.

We could add the Russian version to solve your problem, but we'll never keep up in general until a more robust alternative is identified.

rossabaker avatar Jun 10 '19 20:06 rossabaker

This is way better than we do, but still not great: https://github.com/netty/netty/blob/f17bfd0f64189d91302fbdd15103788bf9eabaa2/handler/src/main/java/io/netty/handler/ssl/SslHandler.java#L1100

rossabaker avatar Jun 10 '19 21:06 rossabaker

I've seen a similar situation recently, only this time the error ("An established connection was aborted by the software in your host machine") wasn't even in the broken pipe messages... I'm not even sure we need this log message anymore. In the past it might have made sense since there was something different happening in this branch, but right now it's EOF for all IOExceptions, so why bother?

nigredo-tori avatar May 14 '20 08:05 nigredo-tori

I think that's fair. If it's not surfacing any that we'd like to treat differently, there really isn't a point.

rossabaker avatar May 14 '20 15:05 rossabaker

@rossabaker So is this error depends on error? I got this error:

2021-03-16 17:51:23,653 [WARN] - Channel IOException not known to be a disconnect error
java.io.IOException: Обрыв канала
	at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
	at java.base/sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:182)
	at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:123)
	at java.base/sun.nio.ch.UnixAsynchronousSocketChannelImpl.implWrite(UnixAsynchronousSocketChannelImpl.java:699)
	at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:383)
	at java.base/sun.nio.ch.AsynchronousSocketChannelImpl.write(AsynchronousSocketChannelImpl.java:417)
	at org.http4s.blaze.channel.nio2.ByteBufferHead.org$http4s$blaze$channel$nio2$ByteBufferHead$$go$1(ByteBufferHead.scala:58)
	at org.http4s.blaze.channel.nio2.ByteBufferHead.writeRequest(ByteBufferHead.scala:74)
	at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:124)
	at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:122)
	at org.http4s.client.blaze.ReadBufferStage.channelWrite(ReadBufferStage.scala:30)
	at org.http4s.client.blaze.ReadBufferStage.writeRequest(ReadBufferStage.scala:38)
	at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:124)
	at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:122)
	at org.http4s.blazecore.IdleTimeoutStage.channelWrite(IdleTimeoutStage.scala:28)
	at org.http4s.blazecore.IdleTimeoutStage.writeRequest(IdleTimeoutStage.scala:62)
	at org.http4s.blaze.pipeline.Tail.channelWrite(Stages.scala:124)
	at org.http4s.blaze.pipeline.Tail.channelWrite$(Stages.scala:122)
	at org.http4s.client.blaze.Http1Connection.channelWrite(Http1Connection.scala:41)
	at org.http4s.blazecore.util.IdentityWriter.writeBodyChunk(IdentityWriter.scala:72)
	at org.http4s.blazecore.util.EntityBodyWriter.$anonfun$writePipe$3(EntityBodyWriter.scala:74)
	at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:104)
	at cats.effect.internals.IORunLoop$.restartCancelable(IORunLoop.scala:51)
	at cats.effect.internals.IOBracket$BracketStart.run(IOBracket.scala:100)
	at cats.effect.internals.Trampoline.cats$effect$internals$Trampoline$$immediateLoop(Trampoline.scala:67)
	at cats.effect.internals.Trampoline.startLoop(Trampoline.scala:35)
	at cats.effect.internals.TrampolineEC$JVMTrampoline.super$startLoop(TrampolineEC.scala:90)
	at cats.effect.internals.TrampolineEC$JVMTrampoline.$anonfun$startLoop$1(TrampolineEC.scala:90)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
	at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:94)
	at cats.effect.internals.TrampolineEC$JVMTrampoline.startLoop(TrampolineEC.scala:90)
	at cats.effect.internals.Trampoline.execute(Trampoline.scala:43)
	at cats.effect.internals.TrampolineEC.execute(TrampolineEC.scala:42)
	at cats.effect.internals.IOBracket$.$anonfun$guaranteeCase$1(IOBracket.scala:116)
	at cats.effect.internals.IOBracket$.$anonfun$guaranteeCase$1$adapted(IOBracket.scala:114)
	at cats.effect.internals.IOAsync$.$anonfun$apply$1(IOAsync.scala:37)
	at cats.effect.internals.IOAsync$.$anonfun$apply$1$adapted(IOAsync.scala:37)
	at cats.effect.internals.IORunLoop$RestartCallback.start(IORunLoop.scala:447)
	at cats.effect.internals.IORunLoop$.cats$effect$internals$IORunLoop$$loop(IORunLoop.scala:156)
	at cats.effect.internals.IORunLoop$RestartCallback.signal(IORunLoop.scala:463)
	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:484)
	at cats.effect.internals.IORunLoop$RestartCallback.apply(IORunLoop.scala:422)
	at cats.effect.internals.IOShift$Tick.run(IOShift.scala:36)
	at cats.effect.internals.PoolUtils$$anon$2$$anon$3.run(PoolUtils.scala:52)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

I guess it happens somewhere here:

val http = new Http(config, (_: util.Map[String, util.List[String]], _: util.Map[String, String]) => ())

retryingOnAllErrors(policy, logOnError){
  http
    .post(url)
    .authentication()
    .client()
    .form()
    .param("client_id", cfg.clientName)
    .response()
    .execute()
}

Also retries seems to stop working when this error happens.

Would it be good if message was in English: java.io.IOException: Channel is broken?

DenisNovac avatar Mar 17 '21 05:03 DenisNovac

Обрыв канала

I'd say that this translates to "Broken pipe", which would be caught by the condition here. So the message you've posted is, indeed, another instance of this issue.

nigredo-tori avatar Mar 17 '21 10:03 nigredo-tori

I thought that my troubles might be because of that but now i see it ends in EOF anyway. Thanks

DenisNovac avatar Mar 17 '21 12:03 DenisNovac

It sounds like we should just go ahead and get rid of that log message, and do so on the series/0.14 branch?

rossabaker avatar Mar 17 '21 15:03 rossabaker

Hi, Not sure if that's related but I'm having another problem - EOF is hard to debug without any stacktrace/additional message attached (blaze-core v0.14.17). For example, we're having a big app that makes calls to multiple external services with the client. And common .handleErrorWith in the main that logs errors. What I see at the end when something goes wrong:

Integration failed: EOF

And nothing else. I don't even know at which step it failed... Is there a way to make it better (for example by attaching info about the request)? And/or attach stacktrace to it?

kell18 avatar Aug 06 '21 16:08 kell18

The EOF is a regrettable design, because it's both a command an an exception. It suppresses the stack trace because the exception is used for control flow and is expected to happen a lot.

We might be able to enhance the request loop in BlazeClient to wrap the EOF in an exception that provides request context. :thinking:

rossabaker avatar Aug 06 '21 18:08 rossabaker