StackExchange.Redis icon indicating copy to clipboard operation
StackExchange.Redis copied to clipboard

Would appreciate assistance solving timeout issues

Open williammck opened this issue 2 years ago • 2 comments

Hey folks,

I've been battling recent issues with one of our workers regularly getting Redis timeouts and would appreciate some help.

A bit of overview:

  • Redis is a single k8s pod using the stock image from DockerHub (although with RDB snapshots turned off), has no resource limits, and what infrequently shows up in the SLOWLOG doesn't usually tie to what shows in the timeout exceptions.
  • We have a timer that, every two seconds, calls FsdService.AddToRedis, which runs StringSetAsync a couple thousand times.
    • i.e. if 750 clients on our network, the for loop over each client calls StringSetAsync twice, so in that example, 1500 calls.
  • This also has a Task from https://github.com/jchristn/SuperSimpleTcp in the background taking in our TCP data, parsing it, and putting it into the List that FsdService.AddToRedis reads from.

I've tried upping the threads in the pool up a bit (set ThreadPoolMinThreads to 16), but that didn't seem to help much. Should I increase it further?

I've looked into SuperSimpleTcp and it seems to be doing everything in tasks/async, so I'd hope we're not blocking, but I can't quite tell.

Any ideas or things I should look at?

Stacktrace examples:

Unhandled exception. StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 7957ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX 1432024, inst: 0, qu: 0, qs: 3, aw: True, bw: SpinningDown, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 15, out-pipe: 0, last-in: 2, cur-in: 15, sync-ops: 0, async-ops: 1012309, serverEndpoint: redis-cache:6379, conn-sec: 1388.4, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=4,Free=32763,Min=16,Max=32767), POOL: (Threads=16,QueuedItems=1,CompletedItems=4070933,Timers=8), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
   at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 388
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
Unhandled exception. StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 7957ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX 1432024, inst: 0, qu: 0, qs: 3, aw: True, bw: SpinningDown, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 15, out-pipe: 0, last-in: 2, cur-in: 15, sync-ops: 0, async-ops: 1012309, serverEndpoint: redis-cache:6379, conn-sec: 1388.42, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=4,Free=32763,Min=16,Max=32767), POOL: (Threads=16,QueuedItems=0,CompletedItems=4070936,Timers=8), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
   at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
Unhandled exception. Unhandled exception. StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 6213ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX AXM3558, inst: 0, qu: 1, qs: 8, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 40, out-pipe: 0, last-in: 2, cur-in: 40, sync-ops: 0, async-ops: 344430, serverEndpoint: redis-cache:6379, conn-sec: 466.82, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=7,Free=32760,Min=16,Max=32767), POOL: (Threads=18,QueuedItems=2,CompletedItems=1400924,Timers=9), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
   at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 6213ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX AXM3558, inst: 0, qu: 1, qs: 8, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 40, out-pipe: 0, last-in: 2, cur-in: 40, sync-ops: 0, async-ops: 344430, serverEndpoint: redis-cache:6379, conn-sec: 466.82, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=6,Free=32761,Min=16,Max=32767), POOL: (Threads=18,QueuedItems=1,CompletedItems=1400924,Timers=9), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
   at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
Unhandled exception. Unhandled exception. StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=1KiB, inbound=0KiB, 6213ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX AXM3558, inst: 0, qu: 0, qs: 8, aw: True, bw: SpinningDown, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 40, out-pipe: 0, last-in: 2, cur-in: 40, sync-ops: 0, async-ops: 344430, serverEndpoint: redis-cache:6379, conn-sec: 466.8, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=6,Free=32761,Min=16,Max=32767), POOL: (Threads=18,QueuedItems=0,CompletedItems=1400923,Timers=8), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
   at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 6205ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX AXM3558, inst: 0, qu: 1, qs: 8, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 40, out-pipe: 0, last-in: 2, cur-in: 40, sync-ops: 0, async-ops: 344430, serverEndpoint: redis-cache:6379, conn-sec: 466.82, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=8,Free=32759,Min=16,Max=32767), POOL: (Threads=18,QueuedItems=3,CompletedItems=1400924,Timers=9), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
   at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
   at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()

williammck avatar Sep 25 '23 00:09 williammck

What's the max size of payload that you're sending here? It may be bandwidth exhaustion from very large payloads, but that is quite the timeout happening - any idea what size you're sending?

NickCraver avatar Sep 30 '23 22:09 NickCraver

Depends on the key - each value is up to a kilobyte max, but usually hovers around 64B - 128B. We are sending up to a few thousand of these all at once. But even pipelined, and with some headroom, I'd be shocked if we're upwards of a megabyte in one go. What's weird too is it's been behaving a lot more like normal this week, but we're seeing the same amount of load, if not more, on the service and the node it's living on. I'm a bit puzzled as to why that'd be happening.

williammck avatar Sep 30 '23 22:09 williammck