StackExchange.Redis "-READONLY You can't write against a read only replica." on ScriptEvaluate(...)

Background:

AWS MemoryDB (REDIS Engine 6.2)
Version: 2.5.43.

During some of testing, we've noticed that the error "-READONLY You can't write against a read only replica.", is showing up in our logs when we attempt to resize our shards in MemoryDB (change the size of the nodes, which causes it to fail over to a replica).

Attempting to retry the operation (we do 3 retries with exponential backoff) it fails all 3 times before finally being thrown a final time. Therefore, it looks like it's not reassigning the primary on these errors.

Looking at the exception, this only seems to affect ScriptEvaluate (LUA Scripts). We don't see any of our other various calls throwing this error: at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 1212

Full Stacktrace:

message=ERR Error running script (call to f_f3...0): @user_script:50: @user_script: 50: -READONLY You can't write against a read only replica.
Stack trace
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2836
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/RedisBase.cs:line 53
   at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 1212

We don't see this behaviour when we do other cluster operations:

Primary Node Fail-overs to Replicants
Adding/removing shards, which cause a hashslot migrations

I will attempt to dig deeper next week, but I wanted to report it.

Apr 08 '22 18:04 dusty9023

Ahh, interesting, well if the Lua script is performing writes, it is illegal to execute it on a replica. To Remove this error you should be able to use the CommandFlag DemandMaster in your ScriptEvaluate commands that are performing writes.

@NickCraver — It looks like the decision as to whether or not the muxer will demand master is made here - there may be a story behind why EVAL/EVALSHA is not included, but it strikes me that the correct behavior would be to add EVAL/EVALSHA to this list of commands unless we know that it's RO? That said, that will be a performance hit for anyone using RO scripts against a clustered instance as it will force all those evals to the same shard. Might be part of a broader discussion about the addition of EVAL_RO and EVALSHA_RO as part of #2055?

Apr 10 '22 00:04 slorello89

@slorello89, @NickCraver

I re-ran the same text using CommandFlag.DemandMaster on ScriptEvaluate, but it seems that it didn't solve the problem. We're now seeing the following error:

StackExchange.Redis.RedisConnectionException: InternalFailure on [0]:EVAL (ScriptResultProcessor) ---> StackExchange.Redis.RedisCommandException: Command cannot be issued to a replica: EVAL
   at StackExchange.Redis.PhysicalBridge.WriteMessageToServerInsideWriteLock(PhysicalConnection connection, Message message) in /_/src/StackExchange.Redis/PhysicalBridge.cs:line 1413
   --- End of inner exception stack trace ---
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2836
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/RedisBase.cs:line 53
   at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 1212

Followed by:

StackExchange.Redis.RedisConnectionException: No connection is active/available to service this operation: EVAL, mc: 1/1/0, mgr: 9 of 10 available, clientName: xxxx, PerfCounterHelperkeyHashSlot: 12635, IOCP: (Busy=2,Free=998,Min=300,Max=1000), WORKER: (Busy=7,Free=32760,Min=8,Max=32767), Local-CPU: unavailable, v: 2.5.43.42402
   at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2836
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/RedisBase.cs:line 53
   at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 1212

Timeline:

April 11 2022 17:30:00 UTC	Triggered the node size change in the AWS Console
April 11 2022 17:31:55 UTC	Scaling down node type of the cluster

At 4/11/22 17:39:44.993, the first error occurred:
InternalFailure on [0]:EVAL (ScriptResultProcessor)

April 11 2022 17:40:42 UTC	Failover to replica node <NODE>-0017-001 completed
...

2022-04-11 17:44:33,541, the first time we see the second error:
message=No connection is active/available to service this operation: EVAL

The errors peak at 17:45:00.


April 11 2022 17:46:10 UTC	Failover to replica node <NODE>-0020-001 completed
...

The errors decrease slowly until 2022-04-11 17:57:46, where the errors stop.

April 11 2022 18:03:02 UTC	Succeeded applying modification to node type to Cluster tcs-perf

Apr 11 '22 18:04 dusty9023

Alrighty this looks like the same "cluster changed and we didn't know it" category (we have 6-7 issues on this). I have a general "reactive" approach checking errors here but it'd be awesome if we had a pub/sub or something to listen for on cluster to realize topology changes.

@slorello89 Do you know if there's anything in Redis we can listen for w.r.t. shards changing roles/amounts/etc.

Apr 12 '22 14:04 NickCraver

@NickCraver, I don't believe there's any pub/sub event that gets triggered when a quorum of the cluster elects to fail-over. The nodes of course all talk to each other when determining whether to fail-over or not, but I'm not aware of a canonical "oh this is the thing that happens clients can pick up on when something fails-over"

The expectation seems to be that while the client will either be:

1: Dumb: just throw every request at whatever redis instance and follow the redirects 2: Smart: cache the routing table.

In the later case, the idea is that you'll follow the redirection directives and remap the cluster-slots whenever you see a MOVED redirection. I'd think you'd be seeing that in this case if the fail-over precipitated this error?

Basically the summation from the cluster spec:

You issue the command
The old master see's that it's in a failed state and stops accepting Writes
When you try to issue the write you should see a -MOVED <cluster_slot>, with probably an empty host/port (because how would it know?) regardless, of the disposition of the host/port, you'd call CLUSTER SLOTS again to re-build the entire map

Apr 12 '22 19:04 slorello89

@slorello89 What are the chances something like this could be added, so that clients could proactively do the right thing with maybe zero disruption, rather than reactively that we have available today? Overall we see issues from this due to shards added or removed but also primary/replica failovers due to cloud providers doing maintenance as a common example.

Apr 12 '22 19:04 NickCraver

@NickCraver: redis/redis#10358 seems there's a PR in flight. Would be a nice feature for sure.

Apr 12 '22 20:04 slorello89

That said, isn't the client already reacting to -MOVED redirects? I think I've seen something about that. Maybe the failed-over node isn't sending the redirects?

Apr 12 '22 20:04 slorello89

@slorello89 We do handle MOVED today, but topology changes such as a primary/replica flipping for a patch session AFAIK we have no way to determine other than constantly polling for topology via INFO (which we do, but don't want to hammer it). It'd be awesome if "cluster membership" was something we could know about and re-assess. When we do to retry on a MOVED for example, we're a bit caught with pants down if it's moved to a node we're not connected to yet - it's a lot more work to do than just re-sending the command. In a downsize case, we likely have a connection and it's fine - in cluster expansion things are much heavier and have more questions.

Apr 12 '22 20:04 NickCraver

@NickCraver - gotcha - but IDK, do you see any evidence that it's even encountering a -MOVED? It's just getting -ERR so far as I can tell, it's like the server's not even giving it a chance to redirect appropriately.

Apr 12 '22 20:04 slorello89

The problem here is it flipped from primary to replica and we didn't know about it - I'd say there are a few topology changes which are inevitably going to result in errors:

Primary/replica swap
Nodes added (we don't have a connection yet)
Nodes removed (we should get a MOVED and respect it, with an existing connection)

In the node change state it's not assured though, because we don't necessarily know the hash map changed.

Note that in the MOVED case you don't see the MOVED - it's transparently handled internally, so I can't say here but I'm basing this on many issues together (I can try to get a full list tonight if time allows - we have 6-8 on this IIRC, all the same cause: cluster topology changed).

Apr 12 '22 20:04 NickCraver