Would appreciate assistance solving timeout issues
Hey folks,
I've been battling recent issues with one of our workers regularly getting Redis timeouts and would appreciate some help.
A bit of overview:
- Redis is a single k8s pod using the stock image from DockerHub (although with RDB snapshots turned off), has no resource limits, and what infrequently shows up in the SLOWLOG doesn't usually tie to what shows in the timeout exceptions.
- We have a timer that, every two seconds, calls
FsdService.AddToRedis, which runsStringSetAsynca couple thousand times.- i.e. if 750 clients on our network, the
forloop over each client callsStringSetAsynctwice, so in that example, 1500 calls.
- i.e. if 750 clients on our network, the
- This also has a Task from https://github.com/jchristn/SuperSimpleTcp in the background taking in our TCP data, parsing it, and putting it into the List that
FsdService.AddToRedisreads from.
I've tried upping the threads in the pool up a bit (set ThreadPoolMinThreads to 16), but that didn't seem to help much. Should I increase it further?
I've looked into SuperSimpleTcp and it seems to be doing everything in tasks/async, so I'd hope we're not blocking, but I can't quite tell.
Any ideas or things I should look at?
Stacktrace examples:
Unhandled exception. StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 7957ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX 1432024, inst: 0, qu: 0, qs: 3, aw: True, bw: SpinningDown, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 15, out-pipe: 0, last-in: 2, cur-in: 15, sync-ops: 0, async-ops: 1012309, serverEndpoint: redis-cache:6379, conn-sec: 1388.4, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=4,Free=32763,Min=16,Max=32767), POOL: (Threads=16,QueuedItems=1,CompletedItems=4070933,Timers=8), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 388
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
Unhandled exception. StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 7957ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX 1432024, inst: 0, qu: 0, qs: 3, aw: True, bw: SpinningDown, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 15, out-pipe: 0, last-in: 2, cur-in: 15, sync-ops: 0, async-ops: 1012309, serverEndpoint: redis-cache:6379, conn-sec: 1388.42, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=4,Free=32763,Min=16,Max=32767), POOL: (Threads=16,QueuedItems=0,CompletedItems=4070936,Timers=8), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
Unhandled exception. Unhandled exception. StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 6213ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX AXM3558, inst: 0, qu: 1, qs: 8, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 40, out-pipe: 0, last-in: 2, cur-in: 40, sync-ops: 0, async-ops: 344430, serverEndpoint: redis-cache:6379, conn-sec: 466.82, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=7,Free=32760,Min=16,Max=32767), POOL: (Threads=18,QueuedItems=2,CompletedItems=1400924,Timers=9), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 6213ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX AXM3558, inst: 0, qu: 1, qs: 8, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 40, out-pipe: 0, last-in: 2, cur-in: 40, sync-ops: 0, async-ops: 344430, serverEndpoint: redis-cache:6379, conn-sec: 466.82, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=6,Free=32761,Min=16,Max=32767), POOL: (Threads=18,QueuedItems=1,CompletedItems=1400924,Timers=9), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
Unhandled exception. Unhandled exception. StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=1KiB, inbound=0KiB, 6213ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX AXM3558, inst: 0, qu: 0, qs: 8, aw: True, bw: SpinningDown, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 40, out-pipe: 0, last-in: 2, cur-in: 40, sync-ops: 0, async-ops: 344430, serverEndpoint: redis-cache:6379, conn-sec: 466.8, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=6,Free=32761,Min=16,Max=32767), POOL: (Threads=18,QueuedItems=0,CompletedItems=1400923,Timers=8), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 6205ms elapsed, timeout is 5000ms), command=SETEX, next: SETEX AXM3558, inst: 0, qu: 1, qs: 8, aw: True, bw: CheckingForTimeoutComplete, rs: DequeueResult, ws: Writing, in: 0, in-pipe: 40, out-pipe: 0, last-in: 2, cur-in: 40, sync-ops: 0, async-ops: 344430, serverEndpoint: redis-cache:6379, conn-sec: 466.82, aoc: 1, mc: 1/1/0, mgr: 9 of 10 available, clientName: slurper-service-74c679b4cd-bqdjg(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=4,Max=1000), WORKER: (Busy=8,Free=32759,Min=16,Max=32767), POOL: (Threads=18,QueuedItems=3,CompletedItems=1400924,Timers=9), v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at VATSIM.Network.Slurper.Service.Services.FsdService.AddToRedis(Object sender, ElapsedEventArgs e) in /src/VATSIM.Network.Slurper.Service/Services/FsdService.cs:line 389
at System.Threading.Tasks.Task.<>c.<ThrowAsync>b__128_1(Object state)
at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
at System.Threading.Thread.StartCallback()
What's the max size of payload that you're sending here? It may be bandwidth exhaustion from very large payloads, but that is quite the timeout happening - any idea what size you're sending?
Depends on the key - each value is up to a kilobyte max, but usually hovers around 64B - 128B. We are sending up to a few thousand of these all at once. But even pipelined, and with some headroom, I'd be shocked if we're upwards of a megabyte in one go. What's weird too is it's been behaving a lot more like normal this week, but we're seeing the same amount of load, if not more, on the service and the node it's living on. I'm a bit puzzled as to why that'd be happening.