ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-11523. Support Listener OM

Open symious opened this issue 1 year ago • 2 comments

What changes were proposed in this pull request?

This ticket is to support Listener OM, based on the Listener feature of Ratis.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11523

How was this patch tested?

unit test.

symious avatar Oct 02 '24 02:10 symious

@symious Thanks for the patch, saw some findbugs failure, could you help resolve the failures?

ivandika3 avatar Oct 02 '24 10:10 ivandika3

Thanks @symious for the patch, I have some basic questions

  1. At what point does the listener transition into a voting member/follower?
  2. Since we are identifying listeners in OM based on configuration, subsequent restarts of OM will always identify the same OM as listener again even if it has already caught up its transactions. Is it upon the user to revert the configuration ozone.om.listener.nodes after it has caught up for the next restart or is it fine if the same OM starts as listener again?

sadanand48 avatar Oct 03 '24 07:10 sadanand48

@sadanand48 Thanks for the review.

At what point does the listener transition into a voting member/follower

Ratis listener (https://issues.apache.org/jira/browse/RATIS-1298) is a non-voting member and will not be transitioned to the follower. In the future, Ratis might start supporting Raft learner which can be promoted to voting member once it has caught up with the leader.

Since we are identifying listeners in OM based on configuration, subsequent restarts of OM will always identify the same OM as listener again even if it has already caught up its transactions. Is it upon the user to revert the configuration ozone.om.listener.nodes after it has caught up for the next restart or is it fine if the same OM starts as listener again?

This is a good point. However, since listener is a non-voting member, the OM will always start as listener. For Raft learner, we need to handle this.

ivandika3 avatar Feb 12 '25 08:02 ivandika3

@symious , could you solve the conflicts when you have time?

ChenSammi avatar Mar 21 '25 05:03 ChenSammi

@ChenSammi Updated, PTAL.

symious avatar Mar 21 '25 06:03 symious

Thanks @symious for working on this. I have roughly gone through the code, and tested it locally. Here are the findings. a. transfer leader to listener code fails as expected, but the error message can be improved. If the new leader assigned is a listener, we can just return the failure.

bash-5.1$ ozone admin om transfer -n=om4
INTERNAL_ERROR om2@group-D66704EFC61C refused to transfer leadership to peer om4 as it is not in conf: {index: 13, cur=peers:[om1|om1:9872, om3|om3:9872, om2|om2:9872]|listeners:[om4|om4:9872], old=null}

b. should have robot test for listener OM c. need a document to explain how to configure listener OM, and how to bootstrap a new listener OM.
d. while listener OM is running, OM request will send to listener OM too. If we can skip the listener, that will reduce the rpc call retry and rpc latency will not be impacted by introducing listener OM.

2025-03-31 10:30:04,540 [main] DEBUG ipc.Client: Connecting to om1/<unresolved>:9862
2025-03-31 10:30:04,540 [main] DEBUG ipc.Client: Setup connection to om1/<unresolved>:9862
2025-03-31 10:30:04,543 [main] DEBUG ipc.Client: Failed to connect to server: om1/<unresolved>:9862: failovers (0) exceeded maximum allowed (0)
java.net.UnknownHostException: Invalid host name: local host is: "om4/172.22.0.11"; destination host is: "om1":9862; java.net.UnknownHostException; For more details see:  http://wiki.apache.org/hadoop/UnknownHost
	at java.base/jdk.internal.reflect.DirectConstructorHandleAccessor.newInstance(DirectConstructorHandleAccessor.java:62)
	at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:502)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:486)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:961)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:889)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:619)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:789)
	at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:364)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1649)
	at org.apache.hadoop.ipc.Client.call(Client.java:1473)
	at org.apache.hadoop.ipc.Client.call(Client.java:1426)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:250)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:132)
	at jdk.proxy2/jdk.proxy2.$Proxy20.submitRequest(Unknown Source)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:437)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:170)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:162)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:100)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:366)
	at jdk.proxy2/jdk.proxy2.$Proxy20.submitRequest(Unknown Source)
	at org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransport.submitRequest(Hadoop3OmTransport.java:73)
	at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.submitRequest(OzoneManagerProtocolClientSideTranslatorPB.java:340)
	at org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.getServiceInfo(OzoneManagerProtocolClientSideTranslatorPB.java:1880)
	at org.apache.hadoop.ozone.client.rpc.RpcClient.<init>(RpcClient.java:260)
	at org.apache.hadoop.ozone.client.OzoneClientFactory.getClientProtocol(OzoneClientFactory.java:264)
	at org.apache.hadoop.ozone.client.OzoneClientFactory.getRpcClient(OzoneClientFactory.java:131)
	at org.apache.hadoop.ozone.shell.OzoneAddress.createRpcClientFromServiceId(OzoneAddress.java:120)
	at org.apache.hadoop.ozone.shell.OzoneAddress.createClient(OzoneAddress.java:167)
	at org.apache.hadoop.ozone.shell.Handler.createClient(Handler.java:82)
	at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:70)
	at org.apache.hadoop.ozone.shell.Handler.call(Handler.java:36)
	at picocli.CommandLine.executeUserObject(CommandLine.java:2041)
	at picocli.CommandLine.access$1500(CommandLine.java:148)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2453)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2415)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273)
	at picocli.CommandLine$RunLast.execute(CommandLine.java:2417)
	at picocli.CommandLine.execute(CommandLine.java:2170)
	at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:88)
	at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:79)
	at org.apache.hadoop.ozone.shell.Shell.lambda$run$0(Shell.java:100)
	at org.apache.hadoop.hdds.tracing.TracingUtil.executeInSpan(TracingUtil.java:182)
	at org.apache.hadoop.hdds.tracing.TracingUtil.executeInNewSpan(TracingUtil.java:147)
	at org.apache.hadoop.ozone.shell.Shell.run(Shell.java:100)
	at org.apache.hadoop.ozone.shell.OzoneShell.main(OzoneShell.java:49)
Caused by: java.net.UnknownHostException
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:621)
	... 42 more

I would suggest to complete (a) and (b) in this patch. For (c) and (d), I'm fine with either implementing in this patch or file follow up JIRAs to do that. If new follow up JIRAs is referred, then I would suggest turn this HDDS-11523 into an umbrella JIRA, have have sub task JIRAs for this feature.

ChenSammi avatar Apr 02 '25 06:04 ChenSammi

@ChenSammi First 2 points finished, PTAL.

symious avatar Jul 31 '25 07:07 symious

https://github.com/peterxcli/ozone/pull/6 I made some code refactor and added some test coverage of the OM listener related scenario. If you think that's good, then I can push them to this patch. TIA!

cc @ChenSammi @jojochuang

peterxcli avatar Aug 08 '25 09:08 peterxcli

  1. this PR introduces a new property ozone.om.listener.nodes. This works fine assuming a single Ozone cluster. But will it working in a multi cluster environment? What about "ozone.om.listener.nodes.cluster1"? Perhaps over-engineering but just wanted to make sure.

Yes, it works in multi-cluster setups. The code always resolves ozone.om.listener.nodes via ConfUtils.addKeySuffixes(..., omServiceId), so the effective keys are service-scoped like ozone.om.listener.nodes.<serviceId>. Example from the compose files: ozone.om.listener.nodes.omservice=om4. https://github.com/apache/ozone/pull/7262/files#diff-1a12ec76bd3be379b70f5cee5364e51d56d021af0ae2a835f466fc7441d9258eR409-R417

/**
 * Get a collection of listener omNodeIds for the given omServiceId.
 */
public static Collection<String> getListenerOMNodeIds(ConfigurationSource conf,
    String omServiceId) {
  String listenerNodesKey = ConfUtils.addKeySuffixes(
      OZONE_OM_LISTENER_NODES_KEY, omServiceId);
  Collection<String> listenerNodeIds = conf.getTrimmedStringCollection(
      listenerNodesKey);

  return listenerNodeIds;
}
/**
 * Get a list of all OM details (address and ports) from the specified config.
 */
public static List<OMNodeDetails> getAllOMHAAddresses(OzoneConfiguration conf,
    String omServiceId, boolean includeDecommissionedNodes) {
  List<OMNodeDetails> omNodesList = new ArrayList<>();
  ...
  Collection<String> listenerNodeIds = conf.getTrimmedStringCollection(
      ConfUtils.addKeySuffixes(OZONE_OM_LISTENER_NODES_KEY,
          omServiceId));
  if (omNodeIds.isEmpty()) {
    // If there are no nodeIds present, return empty list
    return Collections.emptyList();
  }
  for (String nodeId : omNodeIds) {
    try {
      ...
      if (listenerNodeIds.contains(omNodeDetails.getNodeId())) {
        omNodeDetails.setRatisListener();
      }
      ...
  }
  return omNodesList;
}
  1. What if the OMs are removed from ozone.om.listener.nodes? Do they become followers and can start to participate in ratis votes?

Restarts won’t override the committed Ratis role based on config alone.

  1. Adding listener OM does not mean client knows to route its requests to them. How should applications leverage listener OM?

We haven't had follower/listener read. there are two ongoing project 1. https://github.com/apache/ozone/pull/5288 (leverage ratis builtin follower read, not sure if it can apply to the listeners) 2. https://github.com/apache/ozone/pull/7988 (no effect, cause listeners wont have SnapshotDiff on them)

  1. does it support less than 3 voting OMs? For example, 1 leader + listener OM, or 1 leader + 1 follower + listeners.

If ratis support only 1 or 2 voter, then the above should also work.

peterxcli avatar Aug 21 '25 05:08 peterxcli

Hi @symious thanks for the initial patch, I've done adding more test and some code refactor in OMRatisServer, please take a looks also. TIA!

peterxcli avatar Aug 31 '25 02:08 peterxcli

Merged. Thanks @symious for the initial PR and @peterxcli for the comprehensive test coverage, and @ivandika3 @sadanand48 @ChenSammi for reviews.

jojochuang avatar Sep 02 '25 16:09 jojochuang