tracing icon indicating copy to clipboard operation
tracing copied to clipboard

Background periodic publisher doesn't recover from a network exception

Open coderfromhere opened this issue 4 years ago • 2 comments

If a trace collector is temporarily down, a background thread that tries to reach it is expected to survive flushSpans throwing ConnectionFailure:

HttpExceptionRequest Request {
  host                 = "localhost"
  port                 = 9411
  secure               = False
  requestHeaders       = [("content-type","application/json")]
  path                 = "/api/v2/spans"
  queryString          = ""
  method               = "POST"
  proxy                = Nothing
  rawBody              = False
  redirectCount        = 10
  responseTimeout      = ResponseTimeoutDefault
  requestVersion       = HTTP/1.1
}
 (ConnectionFailure Network.Socket.connect: <socket: 54>: does not exist (Connection refused))

coderfromhere avatar Apr 09 '21 02:04 coderfromhere

Agreed - the current behavior is not great. The background thread should fail the whole process on error (relevant read) or continue publishing.

In the meantime here are a couple suggestions to work around this:

  1. Call publish manually with adequate error handling.
  2. (Untested) Specify a custom request manager which retries on a subset of exceptions.

mtth avatar Apr 11 '21 22:04 mtth

The background thread should fail the whole process on error (relevant read) or continue publishing.

Right, the only viable option in case of backend daemons is to carry on with (or without) delayed retrying to send the same payload again, as failing the entire process is hardly desirable. How about performing another forkIO with a retrying-only closure upon receiving a network exception? The number of retries could then be configured similarly to settingsPublishPeriod.

avanov avatar Apr 12 '21 13:04 avanov