Optimal way to configure Hosts on AWS Elasticache, non-clustered, multiple read replicas
My question: In StackExchange.Redis hosts are configured as a list. Will adding only the two AWS managed primary and "read-only" endpoints to the host list result in the expected behavior that requests distribute over all replicas?
or would we have to explicitly list all five nodes' addresses in the list to have that expected behavior?
ie. is there any DNS lookup optimization, ip caching, etc that would prevent each request from being round-robin'd by AWS' endpoint?
Context:
- in AWS Elasticache
- cluster mode disabled
- one primary and four full read-replicas
- the Redis version is >= 3.1 and <= 5.0
- PreferReplica or DemandReplica command flags are used when we want to stay off the primary node
Background: In late 2020, AWS updated Elasticache Redis features to provide a single dns managed primary endpoint and a single read-only endpoint. The advantages of these are:
- AWS automatically manages DNS changes when they perform automatic primary-failover and node changes
- the RO endpoint provides a single configuration value, allowing additional readers to be added without requiring pushing configuration to the apps accessing them
- the RO endpoint performs DNS round-robin distribution to the active read-only nodes
AWS Elacticache Redis - Endpoints Docs
Thanks for the amazing package and time reading my question!
Hi guys, anything on this one? I'm also trying an AWS Redis cluster with multiple read replicas that should be reached through the ro endpoint. So far, the behaviour is not consistent as I don't get cache hits on all the read replica nodes for get operations that specify PreferReplica.
This is what I'm getting right now:

Node ending in 004 is the master and it's not getting any requests which is the expected behaviour, but the node ending in 003 is a replica so it should be used.
Do you know what could be causing the issue? My connection string is like: "master-endpoint,ro-endpoint,ssl=true"
Thanks in advance
The problem here is: the maintainers aren't AWS users. We don't know how AWS topologies are implemented, and we don't have an endpoint to test against, and I have no idea whether this is a small thing or a big thing to do, or whether it is a small thing or a big thing to consumers. And coming up to Xmas, I don't see me having much availability for the next few weeks.
On Thu, 16 Dec 2021, 13:23 Adrian Leon Morell, @.***> wrote:
Hi guys, anything on this one? I'm also trying an AWS Redis cluster with multiple read replicas that should be reached through the ro endpoint. So far, the behaviour is not consistent as I don't get cache hits on all the read replica nodes for get operations that specify PreferReplica.
This is what I'm getting right now: [image: image] https://user-images.githubusercontent.com/17031755/146379085-dc0f43db-1f3f-458d-a2bc-837314fc85ef.png
Node ending in 004 is the master and it's not getting any requests which is the expected behaviour, but the node ending in 003 is a replica so it should be used.
Do you know what could be causing the issue? My connection string is like: "master-endpoint,ro-endpoint,ssl=true"
Thanks in advance
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/StackExchange/StackExchange.Redis/issues/1866#issuecomment-995812992, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAEHMEUGK35COGJVEWOVB3URHR6LANCNFSM5ETZC4JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Hi @mgravell, thanks for the quick response. Elasticach clusters have a restriction about not being reachable outside of a VPC and this makes testing a bit difficult. I could provide a sample project so it can be easily reproduced if that suits. About the AWS topology, AWS provides documentation but bringing someone from the Elasticache team to this discussion or opening a ticket with them could be useful, I can try that as well if this use case moves forward.
This is what the documentation mentions about reader endpoints: "A reader endpoint is not a load balancer. It is a DNS record that will resolve to an IP address of one of the replica nodes in a round robin fashion."
Links:
- https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Replication.Endpoints.html
- https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-elasticache-launches-reader-endpoint-for-redis/
In my case I'm using cluster-mode disabled, as mentioned in the original question of this thread. This means that it is a 1 shard redis DB.
Hi @mgravell, is there anything new you can provide taking into account the last information provided?
@adrianleonmorell looking at this, your behavior is what I'd expect given that documentation given "one of the replica nodes" is explicitly called out - it's not designed to shard load across many. If you can get anyone from AWS to look at this issue ana advise, that's our best path I think. I don't know anyone over there, but happy to make friends :)
Hi,
tl;dr:
If you're using a single machine and a single ConnectionMultiplexer, you should enumerate all of the endpoints. If you're using more ConnectionMultiplexer over all of your client machines, then just the main reader endpoint should be sufficient.
At length: @adrianleonmorell is correct. The load balancing happens at DNS resolution, which means that each connection will only target a single read endpoint. Once the DNS is resolved to a node, the traffic will only target that node. If the reader discovery endpoint DNS address is only resolved once, then only the reader node who's address was resolved will be connected.
In order to understand what this means in practice, I ran this code on an ElastiCache cluster with 4 replicas,
private static ConfigurationOptions options()
{
var options = new ConfigurationOptions();
options.EndPoints.Add(/*primary endpoint */, 6379);
options.EndPoints.Add(/*reader discovery endpoint */, 6379);
return options;
}
public static void Main()
{
using (var client = ConnectionMultiplexer.Connect(options()))
{
var db = client.GetDatabase();
for (int i = 0; i < 200000; i++)
{
db.StringSet("Ahoy", "Matey" + i);
Console.WriteLine(db.StringGet("Ahoy", CommandFlags.DemandReplica));
}
}
}
As expected, I see all of the reads are performed from the same node. The numbers don't equal 200000 because they are the maximum cache hits in a 10 seconds gap, and the loop ran for more than 10 seconds. This also explains the gap in the other images ahead.

So I changed the options function, so that the returned configuration will enumerate all of the reader nodes addresses:
private static ConfigurationOptions options()
{
var options = new ConfigurationOptions();
options.EndPoints.Add(/*primary*/, 6379);
options.EndPoints.Add(/*replica 1*/, 6379);
options.EndPoints.Add(/*replica 2*/, 6379);
options.EndPoints.Add(/*replica 3*/, 6379);
options.EndPoints.Add(/*replica 4*/, 6379);
return options;
}
and I see that other reader nodes are touched:

But I don't have to enumerate all of the nodes. Similar results can be achieved by keeping 2 endpoints, but constantly recreating the ConnectionMultiplexer, thus letting the DNS resolution spread the connections between the nodes. In the code, I moved the creation of client and db into the loop:
private static ConfigurationOptions options()
{
var options = new ConfigurationOptions();
options.EndPoints.Add(/*primary endpoint */, 6379);
options.EndPoints.Add(/*reader discovery endpoint */, 6379);
return options;
}
public static void Main()
{
for (int i = 0; i < 200000; i++)
{
using (var client = ConnectionMultiplexer.Connect(options()))
{
var db = client.GetDatabase();
db.StringSet("Ahoy", "Matey" + i);
Console.WriteLine(db.StringGet("Ahoy", CommandFlags.DemandReplica));
}
}
}

Notice that in both cases the reader nodes aren't hit entirely equally. If that's a requirement, I assume that the user will need to implement endpoint selection on their end.
@shachlanAmazon According to the documentation, you can use a singleton IConnectionMultiplexer. Did you try getting the IDatabase inside the for loop?
public static void Main()
{
using (var client = ConnectionMultiplexer.Connect(options()))
{
for (int i = 0; i < 200000; i++)
{
var db = client.GetDatabase();
db.StringSet("Ahoy", "Matey" + i);
Console.WriteLine(db.StringGet("Ahoy", CommandFlags.DemandReplica));
}
}
}
You can move the GetDatabase() outside too:
var db = client.GetDatabase();
for (int i = 0; i < 200000; i++)
{
That was the first test I described :)