java-driver icon indicating copy to clipboard operation
java-driver copied to clipboard

Add SubnetAddressTranslator

Open jahstreet opened this issue 11 months ago • 2 comments

When running Cassandra in a private network and accessing it from outside of that private network via some kind of proxy, we have an option to use FixedHostNameAddressTranslator. But when we want to set it up in a HA way and have more control over latencies in multi-datacenter deployments, that is not enough.

This PR proposes a SubnetAddressTranslator, which translates Cassandra node IP addresses based on the match to the configured subnet IP range (CIDR notation). The assumption is that each Cassandra datacenter nodes belong to different subnets not having intersecting IP ranges, which is the usual configuration for multi-DC Kubernetes and K8ssandra, for example.

jahstreet avatar Feb 10 '25 11:02 jahstreet

Excellent work @jahstreet, thank you! Have some suggestions, but I'm +1 either way.

Thanks! I will push a commit with annotations till Monday morning.

jahstreet avatar Feb 15 '25 17:02 jahstreet

Apologies, this is on my list but I haven't made it back to reconsider the updated comments in this review. I appreciate your patience @jahstreet!

FWIW I have added this to the 4.19.1 release planning doc under the working assumption that we'll almost certainly get this in in some form we can all agree on.

absurdfarce avatar Apr 01 '25 22:04 absurdfarce

Deal, thx for the feedback. Will address it asap.

UPD: done.

jahstreet avatar Jun 07 '25 11:06 jahstreet

Suggested commit message:

"Add SubnetAddressTranslator to translate Cassandra node IPs from private network based on its subnet mask"

jahstreet avatar Jun 10 '25 19:06 jahstreet

Confirmed that the build looks good locally with the recent changes from @jahstreet . Kicking off a DataStax Jenkins run now to confirm that we haven't had any unexpected regressions.

absurdfarce avatar Jun 10 '25 19:06 absurdfarce

Bah, Jenkins failed with complaints like the following:

[2025-06-10T22:25:35.767Z] [INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ java-driver-core ---
[2025-06-10T22:25:35.767Z] [INFO] Compiling 796 source files to /home/jenkins/workspace/drivers_java_oss_PR-2013/core/target/classes
[2025-06-10T22:25:38.559Z] [INFO] -------------------------------------------------------------
[2025-06-10T22:25:38.559Z] [ERROR] COMPILATION ERROR : 
[2025-06-10T22:25:38.559Z] [INFO] -------------------------------------------------------------
[2025-06-10T22:25:38.559Z] [ERROR] /home/jenkins/workspace/drivers_java_oss_PR-2013/core/src/main/java/com/datastax/oss/driver/internal/core/addresstranslation/Subnet.java:[21,30] package com.google.common.base does not exist
[2025-06-10T22:25:38.559Z] [INFO] 1 error
[2025-06-10T22:25:38.559Z] [INFO] -------------------------------------------------------------

Weird thing was that local builds were just fine. This made absolutely no sense to me; I spent the afternoon trying various combinations of Maven versions and POM settings to see what might account for the difference.

Naturally the answer was pretty much right in front of me the whole time. This PR was branched off from a point in 4.x before this commit went in. And that commit fixes precisely this behaviour. Once I included this change in my local checkout of the PR branch I was able to easily reproduce the failure above in my local build.

@jahstreet can you merge 4.x into your PR branch so that we can get this fix on your branch as well? I think such a merge + the other work you've already done should enable us to run a full build on our Jenkins server.

Thanks!

absurdfarce avatar Jun 10 '25 23:06 absurdfarce

Oh, almost forgot... the reason the build now fails with that change in place is because you're using the normal Guava packages here (and potentially other places in your PR... I admit I haven't checked yet). This should be changed to use the shaded packages for Guava along the lines of the VisibleForTesting import just above your addition.

absurdfarce avatar Jun 10 '25 23:06 absurdfarce

Sorry to make you spend much time time on it. Rebase is obviously a good thing to have always. On it.

jahstreet avatar Jun 11 '25 06:06 jahstreet

No worries @jahstreet ... I'm just happy we figured it out and have a clear path now to get this in!

absurdfarce avatar Jun 11 '25 06:06 absurdfarce

Did the visual check and corrected guava references. Hope now it is in a good shape 🤞 .

@absurdfarce how (with which maven args) do you build/compile locally? I wasn't able to reproduce the dependency errors with basic mvn clean package -DskipTests

jahstreet avatar Jun 11 '25 07:06 jahstreet

These changes look good @jahstreet , thanks for jumping on this so fast! A local test build seemed to succeed without a problem so I've kicked off a Jenkins build to (a) confirm that we're good and (b) throw all the unit and integration tests at it. I'm cautiously optimistic we've got it this time!

As for what I've been running... most of the time I've been using the same command used by the DataStax Jenkinsfile (mvn -B -V install -DskipTests -Dmaven.javadoc.skip=true) but I've been able to reproduce it with the even simpler mvn clean install -DskipTests=true. Note that I did not see this failure in my local test builds until I manually added the change to make guava-shaded an optional dependency; prior to that change the only place I saw it was on Jenkins. And I'm pretty sure that's because Jenkins was merging the changes in the PR into the current state of 4.x... which meant the commit making the guava-shaded dependency optional was brought into play. That's just a hypothesis but it seems to fit all the facts.

absurdfarce avatar Jun 11 '25 19:06 absurdfarce

Hey @jahstreet, the Jenkins run is still ongoing but I'm seeing multiple test failures for SubnetTest and SubnetAddressTest. Haven't looked into it too deeply but most (all?) of the failures look to be from an empty string being passed to the Integer.parseInt() call in Subnet.parse(). I was able to repro immediately by running the test locally in my IDE.

absurdfarce avatar Jun 11 '25 20:06 absurdfarce

Oh, I see... fortunately this one's an easy fix:

diff --git a/core/src/main/java/com/datastax/oss/driver/internal/core/addresstranslation/Subnet.java b/core/src/main/java/com/datastax/oss/driver/internal/core/addresstranslation/Subnet.java
index ec83626c5..7c25e94e2 100644
--- a/core/src/main/java/com/datastax/oss/driver/internal/core/addresstranslation/Subnet.java
+++ b/core/src/main/java/com/datastax/oss/driver/internal/core/addresstranslation/Subnet.java
@@ -45,7 +45,7 @@ class Subnet {
   }
 
   static Subnet parse(String subnetCIDR) throws UnknownHostException {
-    List<String> parts = Splitter.on("/").splitToList("/");
+    List<String> parts = Splitter.on("/").splitToList(subnetCIDR);
     if (parts.size() != 2) {
       throw new IllegalArgumentException("Invalid subnet: " + subnetCIDR);
     }

absurdfarce avatar Jun 11 '25 20:06 absurdfarce

Thx for the hints @absurdfarce 🦅 👁️ . Indeed, the fast refactoring outputted the tiny bug. Thx for checking. Double checked the tests locally ✅ . 🤞 for another CI run. Will check it in my CEST morning.

jahstreet avatar Jun 11 '25 22:06 jahstreet

There is some difference between my local env and CI, specifically in the way the DNS is resolved:

Expecting message to be:
  "Configured subnets are overlapping: SubnetAddress[subnet=[100, 64, 0, 0], address=cassandra.datacenter1.com:19042], SubnetAddress[subnet=[100, 65, 0, 0], address=cassandra.datacenter2.com:19042]"
but was:
  "Configured subnets are overlapping: SubnetAddress[subnet=[100, 64, 0, 0], address=cassandra.datacenter1.com/<unresolved>:19042], SubnetAddress[subnet=[100, 65, 0, 0], address=cassandra.datacenter2.com/<unresolved>:19042]"

I will give a thought today on how to approach it.

jahstreet avatar Jun 12 '25 06:06 jahstreet

Looks good now, but fires some test failures that don't seem related to my changes 🤔 . @absurdfarce are you in the context of these failing tests?

jahstreet avatar Jun 12 '25 10:06 jahstreet

Okay, with your latest round of fixes I do get a good run against DataStax CI @jahstreet. There are indeed some test failures but they're all known problematic cases, mostly CASSJAVA-95. Given these results I'm comfortable changing my review to an approval.

@tolbertam and/or @aratno ... I know one or both of you were planning on taking another look at this PR once we got it to a stable state. I think we're there now so if I can get another :+1: from one (or both) of you I think we're good to go here.

absurdfarce avatar Jun 12 '25 19:06 absurdfarce

Thank you folks for guiding me 🙏 .

jahstreet avatar Jun 13 '25 16:06 jahstreet

Thanks @tolbertam . And a huge thank you to @jahstreet for your persistence on this one!

absurdfarce avatar Jun 13 '25 18:06 absurdfarce