StackExchange.Redis icon indicating copy to clipboard operation
StackExchange.Redis copied to clipboard

Abnormal master-slave switching in sentinelmode

Open hello-ldf opened this issue 1 year ago • 1 comments

Thank you very much for your contribution. I encountered some difficulties when using it and would like to ask for advice.

My redis configuration is 1 master A, 1 slave B, 3 sentinels.

In sentinel mode, I found that after node A went offline, node B was upgraded to the master node. There was no problem in this process.

But when the old master node A comes online again, stackExchange.redis will briefly mark both nodes as primary, and use the old master node A. But A is converting to a slave node at this time and synchronizing new data. This will lead to the error of reading and writing A at this time.

The program will mark A as a replica later, and the error will be corrected.

Regarding this problem, how should I solve it? Is there any relevant configuration that can solve this problem?

MyTestCode

using StackExchange.Redis;
using static StackExchange.Redis.Role;

namespace ConsoleApp3
{
    internal class Program
    {
        static async Task Main(string[] args)
        {
            Console.WriteLine("Hello, World!");

            ConnectionMultiplexer masterConnection = ConnectionMultiplexer.Connect("172.16.184.197:26379,172.16.184.198:26379,172.16.0.29:26379,serviceName=mymaster");
            while (true)
            {
                try
                {
                    IDatabase database = masterConnection.GetDatabase();
                    await database.StringIncrementAsync("SomeKey", 1);
                    System.Console.WriteLine(masterConnection.GetStatus());
                    System.Console.WriteLine(DateTime.Now.ToString("yyyyMMddHHmmss::") + await database.StringGetAsync("SomeKey"));
                }
                catch (Exception ex)
                {
                    Console.WriteLine(ex.Message);
                }
                await Task.Delay(1000);
            }
        }
    }
}

Start

Image

stop A and switch to B

Image

start A and the data is dirty

Image

recover normal

Image

hello-ldf avatar Jan 21 '25 03:01 hello-ldf

There's not currently a way to shorten this window, as there's inherently some race during which both actually are primary servers, the switch on the Redis side is not atomic. For example it may have to change topology to reverse and then even re-replicate the data in some circumstances. There's also network latency involved, so there simply is a gap that happens during failover. We monitor this and react as soon as we get it (via Sentinel notifications), but there is still some delay.

NickCraver avatar Feb 18 '25 16:02 NickCraver