CASSJAVA-92: Local DC provided for nodetool clientstats
@aratno If you have some cycles do you mind taking a look at this and giving me a second 👍 ? I know you and @smiklosovic are still in conversation about the server-side of this but I think we're in a good spot to at least call the client-side done.
So, I think this addresses the serialization changes referenced above. But we are seeing a host of IT failures with errors similar to the following:
Expected size: 2 but was: 1 in:
[com.datastax.oss.driver.api.core.connection.ConnectionInitException: [s10|control|id: 0xc1585e83, L:/127.0.0.1:33080 - R:/127.0.0.14:9043] Protocol initialization request, step 2 (OPTIONS): unexpected exception (java.lang.NullPointerException: null value in entry: localDc=null)
at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.fail(ProtocolInitHandler.java:358)
at com.datastax.oss.driver.internal.core.channel.ProtocolInitHandler$InitRequest.onResponse(ProtocolInitHandler.java:351)
at com.datastax.oss.driver.internal.core.channel.ChannelHandlerRequest.onResponse(ChannelHandlerRequest.java:96)
...(22 remaining lines not displayed - this can be changed with Assertions.setMaxStackTraceElementsDisplayed)]
Underlying problem here seems to be a variation on something we discussed earlier; some startup options aren't available at context creation time (when StartupOptionsBuilder is created). Let's focus specifically on local DC; it's computed based on the nodes in the cluster and is computed when the LBP is initialized. But that doesn't happen until we create the connection pool which makes sense (given that we don't have full info about available nodes until much beyond that point).
Upshot is I don't think we can handle local DC in StartupOptionsBuilder... it's gonna have to be computed elsewhere and added to the STARTUP message at generation time.
I am not sure why all tests passed when I was running tests locally. Apologies for that.
Indeed test AllNodesFailedIT shows chicken and egg problem with discovering local datacenter. As far as I understand the intention form the server side, driver should publish local DC value that was passed in application configuration. I have slightly changed OptionalLocalDcHelper and extracted a method to return configured local data center if any was specified.
Thanks for the fixes @lukasz-antoniak. I can confirm that these fixes address the ConnectionInitExceptions I mentioned in my prior message.
There's still something that has been gnawing at me about this impl but despite a hefty amount of effort I haven't been able to cleanly articulate my concern. I think at the core it's the idea that we implement an extra interface (LocalDcAwareLoadBalancingPolicy) here to indicate that an LBP is aware of it's local DC, a change that in general I completely agree with. But then we say that this functionality can only be determined in some cases (i.e. if a user configures a DC in some way). This covers the default case but that's only due to the implementation of DefaultLoadBalancingPolicy. We have an extension of the default policy named DcInferringLoadBalancingPolicy which will try to "infer" the local DC (based on contact points) if nothing is specified. This LBP may or may not have a value... again, it depends entirely on the impl. It's a similar story for custom LBP impls.
A reasonable question to ask here is "if we change LocalDcAwareLoadBalancingPolicy.getLocalDatacenter() to return an Optional<String> rather than a String doesn't that address the problem?" Maybe it does... and I think that might be a very reasonable change. But it's not completely satisfying; we're still saying we have this whole interface because an LBP is aware of it's local DC... until it's not.
I'm still not explaining this very well, and for that I apologize.
@aratno does any of this resonate for you or am I just nuts?
After delegating the logic of providing startup configuration to a implementations of LoadBalancingPolicy, I think we do not need LocalDcAwareLoadBalancingPolicy any more. This simplifies the changes and addresses some concerns pointed by @absurdfarce above.
I have implemented re-evaluation of startup options upon every request and renamed test class. I am not quite satisfied with disabling option. It was hard to come up with a decent name for this configuration parameter (see last commit). I think that the payload of driver baggage will be so small in size, that maybe we should leave it enabled all the time? @aratno, are you OK with leaving driver baggage enabled and provide this feature when time comes and this payload will grow?
While I'm thinking of it: we should probably create a ticket for the follow-up work referenced here so that it's at least in the backlog.
Note: CASSJAVA-112 is the follow-up ticket discussed above.
I've pushed changes to remove the config option to disable baggage collection per the latest review from @aratno. @lukasz-antoniak I'd like one last review of those changes from you before we call this done.
I'm also gonna kick off one last Jenkins run, just to make sure we didn't have any unexpected regressions here.
Jenkins run looks good. If @lukasz-antoniak is good with the most recent changes I think we're (finally!) all done with this one.