hyperbahn icon indicating copy to clipboard operation
hyperbahn copied to clipboard

Advertisement Timeout SLA

Open rssathe opened this issue 10 years ago • 3 comments

{
  "log": "{\"error\":{\"stack\":\"TchannelRequestTimeoutError: request timed out after 500ms (limit was 500ms)\\n    at Object.createError [as RequestTimeoutError] (/home/udocker/chronotrigger/node_modules/tchannel/node_modules/error/typed.js:31:22)\\n    at V2OutRequest.onTimeout (/home/udocker/chronotrigger/node_modules/tchannel/out_request.js:555:31)\\n    at TimeHeap.callExpiredTimeouts (/home/udocker/chronotrigger/node_modules/tchannel/time_heap.js:169:14)\\n    at TimeHeap.drainExpired (/home/udocker/chronotrigger/node_modules/tchannel/time_heap.js:160:14)\\n    at TimeHeap.onTimeout (/home/udocker/chronotrigger/node_modules/tchannel/time_heap.js:144:10)\\n    at onTimeout [as _onTimeout] (/home/udocker/chronotrigger/node_modules/tchannel/time_heap.js:135:14)\\n    at Timer.listOnTimeout [as ontimeout] (timers.js:112:15)\",\"type\":\"tchannel.request.timeout\",\"message\":\"request timed out after 500ms (limit was 500ms)\",\"id\":871,\"start\":1443038570760,\"elapsed\":500,\"timeout\":500,\"logical\":true,\"name\":\"TchannelRequestTimeoutError\",\"fullType\":\"tchannel.request.timeout\"},\"serviceName\":\"chronotrigger\",\"level\":\"error\",\"message\":\"HyperbahnClient: advertisement failure, marking server as sick\"}\n",
  "stream": "stderr",
  "time": "2015-09-23T20:02:51.260517901Z"
}

@Raynos @jcorbin Deployment Process results in hyperbahn timeouts

rssathe avatar Sep 24 '15 19:09 rssathe

When we deploy hyperbahn we see an increase in advertise timeouts in edge clients.

This is not acceptable. We should figure out how to stick within our 500ms SLA using more tricks and/or drain.

Raynos avatar Sep 24 '15 20:09 Raynos

@Raynos , it could be that we shouldn't be exempting hyperbahn protocol itself from drain: https://github.com/uber/hyperbahn/blob/master/app.js#L78 ; instead you'd see elevated declined responses to advertisements during a deploy. Advertises being declined are okay, and shouldn't themselves necessarily trigger a re-ad, but what should and what would help such case would be the re-ad on re-conn work we keep mentioning.

jcorbin avatar Sep 24 '15 21:09 jcorbin

@jcorbin OOPS. Yes we should not exclude it from drain.

Raynos avatar Sep 24 '15 23:09 Raynos