HDDS-11523. Support Listener OM
What changes were proposed in this pull request?
This ticket is to support Listener OM, based on the Listener feature of Ratis.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11523
How was this patch tested?
unit test.
@symious Thanks for the patch, saw some findbugs failure, could you help resolve the failures?
Thanks @symious for the patch, I have some basic questions
- At what point does the listener transition into a voting member/follower?
- Since we are identifying listeners in OM based on configuration, subsequent restarts of OM will always identify the same OM as listener again even if it has already caught up its transactions. Is it upon the user to revert the configuration
ozone.om.listener.nodesafter it has caught up for the next restart or is it fine if the same OM starts as listener again?
@sadanand48 Thanks for the review.
At what point does the listener transition into a voting member/follower
Ratis listener (https://issues.apache.org/jira/browse/RATIS-1298) is a non-voting member and will not be transitioned to the follower. In the future, Ratis might start supporting Raft learner which can be promoted to voting member once it has caught up with the leader.
Since we are identifying listeners in OM based on configuration, subsequent restarts of OM will always identify the same OM as listener again even if it has already caught up its transactions. Is it upon the user to revert the configuration ozone.om.listener.nodes after it has caught up for the next restart or is it fine if the same OM starts as listener again?
This is a good point. However, since listener is a non-voting member, the OM will always start as listener. For Raft learner, we need to handle this.
@symious , could you solve the conflicts when you have time?
@ChenSammi Updated, PTAL.
Thanks @symious for working on this. I have roughly gone through the code, and tested it locally. Here are the findings. a. transfer leader to listener code fails as expected, but the error message can be improved. If the new leader assigned is a listener, we can just return the failure.
bash-5.1$ ozone admin om transfer -n=om4
INTERNAL_ERROR om2@group-D66704EFC61C refused to transfer leadership to peer om4 as it is not in conf: {index: 13, cur=peers:[om1|om1:9872, om3|om3:9872, om2|om2:9872]|listeners:[om4|om4:9872], old=null}
b. should have robot test for listener OM
c. need a document to explain how to configure listener OM, and how to bootstrap a new listener OM.
d. while listener OM is running, OM request will send to listener OM too. If we can skip the listener, that will reduce the rpc call retry and rpc latency will not be impacted by introducing listener OM.
2025-03-31 10:30:04,540 [main] DEBUG ipc.Client: Connecting to om1/<unresolved>:9862
2025-03-31 10:30:04,540 [main] DEBUG ipc.Client: Setup connection to om1/<unresolved>:9862
2025-03-31 10:30:04,543 [main] DEBUG ipc.Client: Failed to connect to server: om1/<unresolved>:9862: failovers (0) exceeded maximum allowed (0)
java.net.UnknownHostException: Invalid host name: local host is: "om4/172.22.0.11"; destination host is: "om1":9862; java.net.UnknownHostException; For more details see: http://wiki.apache.org/hadoop/UnknownHost
at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:961)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:889)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:619)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:789)
at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:364)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1649)
at org.apache.hadoop.ipc.Client.call(Client.java:1473)
at org.apache.hadoop.ipc.Client.call(Client.java:1426)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:250)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:132)
at jdk.proxy2/jdk.proxy2.$Proxy20.submitRequest(Unknown Source)
at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base/java.lang.reflect.Method.invoke(Method.java:580)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366)
at jdk.proxy2/jdk.proxy2.$Proxy20.submitRequest(Unknown Source)
at org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransport.submitRequest(Hadoop3OmTransport.java:73)
at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:340)
at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.getServiceInfo(OzoneManagerProtocolClientSideTranslatorPB.java:1880)
at org.apache.hadoop.ozone.client.rpc.RpcClient.<init>(RpcClient.java:260)
at org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:264)
at org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:131)
at org.apache.hadoop.ozone.shell.OzoneAddress.createRpcClientFromServiceId(OzoneAddress.java:120)
at org.apache.hadoop.ozone.shell.OzoneAddress.createClient(OzoneAddress.java:167)
at org.apache.hadoop.ozone.shell.Handler.createClient(Handler.java:82)
at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:70)
at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:36)
at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
at picocli.CommandLine.access$1500(CommandLine.java:148)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
at picocli.CommandLine.execute(CommandLine.java:2170)
at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:88)
at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:79)
at org.apache.hadoop.ozone.shell.Shell.lambda$run$0(Shell.java:100)
at org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:182)
at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:147)
at org.apache.hadoop.ozone.shell.Shell.run(Shell.java:100)
at org.apache.hadoop.ozone.shell.OzoneShell.main(OzoneShell.java:49)
Caused by: java.net.UnknownHostException
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:621)
... 42 more
I would suggest to complete (a) and (b) in this patch. For (c) and (d), I'm fine with either implementing in this patch or file follow up JIRAs to do that. If new follow up JIRAs is referred, then I would suggest turn this HDDS-11523 into an umbrella JIRA, have have sub task JIRAs for this feature.
@ChenSammi First 2 points finished, PTAL.
https://github.com/peterxcli/ozone/pull/6 I made some code refactor and added some test coverage of the OM listener related scenario. If you think that's good, then I can push them to this patch. TIA!
cc @ChenSammi @jojochuang
- this PR introduces a new property ozone.om.listener.nodes. This works fine assuming a single Ozone cluster. But will it working in a multi cluster environment? What about "ozone.om.listener.nodes.cluster1"? Perhaps over-engineering but just wanted to make sure.
Yes, it works in multi-cluster setups. The code always resolves ozone.om.listener.nodes via ConfUtils.addKeySuffixes(..., omServiceId), so the effective keys are service-scoped like ozone.om.listener.nodes.<serviceId>. Example from the compose files: ozone.om.listener.nodes.omservice=om4.
https://github.com/apache/ozone/pull/7262/files#diff-1a12ec76bd3be379b70f5cee5364e51d56d021af0ae2a835f466fc7441d9258eR409-R417
/**
* Get a collection of listener omNodeIds for the given omServiceId.
*/
public static Collection<String> getListenerOMNodeIds(ConfigurationSource conf,
String omServiceId) {
String listenerNodesKey = ConfUtils.addKeySuffixes(
OZONE_OM_LISTENER_NODES_KEY, omServiceId);
Collection<String> listenerNodeIds = conf.getTrimmedStringCollection(
listenerNodesKey);
return listenerNodeIds;
}
/**
* Get a list of all OM details (address and ports) from the specified config.
*/
public static List<OMNodeDetails> getAllOMHAAddresses(OzoneConfiguration conf,
String omServiceId, boolean includeDecommissionedNodes) {
List<OMNodeDetails> omNodesList = new ArrayList<>();
...
Collection<String> listenerNodeIds = conf.getTrimmedStringCollection(
ConfUtils.addKeySuffixes(OZONE_OM_LISTENER_NODES_KEY,
omServiceId));
if (omNodeIds.isEmpty()) {
// If there are no nodeIds present, return empty list
return Collections.emptyList();
}
for (String nodeId : omNodeIds) {
try {
...
if (listenerNodeIds.contains(omNodeDetails.getNodeId())) {
omNodeDetails.setRatisListener();
}
...
}
return omNodesList;
}
- What if the OMs are removed from
ozone.om.listener.nodes? Do they become followers and can start to participate in ratis votes?
Restarts won’t override the committed Ratis role based on config alone.
- Adding listener OM does not mean client knows to route its requests to them. How should applications leverage listener OM?
We haven't had follower/listener read. there are two ongoing project 1. https://github.com/apache/ozone/pull/5288 (leverage ratis builtin follower read, not sure if it can apply to the listeners) 2. https://github.com/apache/ozone/pull/7988 (no effect, cause listeners wont have SnapshotDiff on them)
- does it support less than 3 voting OMs? For example, 1 leader + listener OM, or 1 leader + 1 follower + listeners.
If ratis support only 1 or 2 voter, then the above should also work.
Hi @symious thanks for the initial patch, I've done adding more test and some code refactor in OMRatisServer, please take a looks also. TIA!
Merged. Thanks @symious for the initial PR and @peterxcli for the comprehensive test coverage, and @ivandika3 @sadanand48 @ChenSammi for reviews.