ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-10409. Decommissioning of datanodes - Duplicate ozone nodes display as dead in Ozone Recon.

Open devmadhuu opened this issue 1 year ago • 5 comments

What changes were proposed in this pull request?

This PR change provides a new API in Recon for explicit removal of DEAD or decommissioned nodes explicitly.

Since Recon has no way to know when to stop tracking any DEAD or decommissioned node , Recon will continue to list DEAD or decommissioned nodes forever. Because sometimes DEAD or decommissioned nodes may come back and rejoin cluster. In some cases, it is being observed that some DNs being formatted and wipes out complete metadata of cluster and then recommissioned back to rejoin cluster with different UUID, so there is absolutely no way for ozone/recon to know if it is same node and to make things simpler, it is being suggested that explicit removal of DEAD or decommissioned datanode should be supported whenever admin user acknowledges to stop tracking such datanodes.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10409

How was this patch tested?

Patch is tested with new integration test.

devmadhuu avatar Mar 10 '24 09:03 devmadhuu

@errose28 @sodonnel Kindly review.

devmadhuu avatar Mar 11 '24 04:03 devmadhuu

thanks @devmadhuu for the patch.

Regarding the integration test, I'm considering whether it's necessary for this specific functionality. However, I believe the endpoint should be thoroughly tested by Recon tests.

Thanks @myskov for the review, Sure, I'll add a test for testing EndPoint as well.

devmadhuu avatar Mar 11 '24 09:03 devmadhuu

thanks @devmadhuu for the patch. Regarding the integration test, I'm considering whether it's necessary for this specific functionality. However, I believe the endpoint should be thoroughly tested by Recon tests.

Thanks @myskov for the review, Sure, I'll add a test for testing EndPoint as well.

thanks @devmadhuu for the patch.

Regarding the integration test, I'm considering whether it's necessary for this specific functionality. However, I believe the endpoint should be thoroughly tested by Recon tests.

@myskov Test cases for testing endpoint has been added. Kindly re-review.

devmadhuu avatar Mar 12 '24 10:03 devmadhuu

Thanks for taking up this usability issue @devmadhuu. I'm a bit worried about the current approach introducing other bugs in Recon and SCM that will be hard to detect. It is very difficult to tell if the nodes are saved to some other in-memory structure at the time of removal. Even just in the same ReconNodeManager class I see two maps that also contain datanode information that are not being updated. That said, I'm not sure of a better way to do it since Recon needs to persist this removed node information through restarts. Is there anything we can do to make sure that removing nodes does not corrupt any existing data structures?

Also I think nodes should get automatically re-added if they heartbeat to Recon. Especially since we don't have a way to manually re-add a node right now.

@errose28 thanks for reviewing the patch. Here is my analysis if we want to clear the memory data structures for removed node without restarting, Pls check and confirm: 1. SCMNodeManager: When a new node is registered, following data structures gets populated: - [ ] org.apache.hadoop.hdds.scm.node.SCMNodeManager#clusterMap - [ ] org.apache.hadoop.hdds.scm.node.NodeStateManager#nodeStateMap - [ ] org.apache.hadoop.hdds.scm.node.SCMNodeManager#dnsToUuidMap

So when we are removing a Datanode to stop tracking, we should clear above data structures for the removed node ?

On Adding new Node, NewNodeHandler gets called where we do following: - [ ] PipelineManager calls org.apache.hadoop.hdds.scm.pipeline.PipelineManager#closeStalePipelines So on removing node as well, we should call this method to clean any stale pipelines associated with removed DN.

If a node being removed is in “DECOMMISSIONED” or “IN_MAINTENANCE” but not in “DEAD” state, then on remove API call, should we call “org.apache.hadoop.hdds.scm.node.DeadNodeHandler#onMessage” ? Because here in DeadNodeHandler, we are doing following cleanup: - [ ] Close containers associated with Datanode - [ ] Destroy pipelines associated with Datanode. - [ ] Remove the container replicas associated with Datanode. - [ ] Remove commands in command queue for the Datanode. - [ ] Remove DeleteBlocksCommand associated with the Datanode. - [ ] Move dead datanode out of ClusterNetworkTopology.

devmadhuu avatar Mar 27 '24 04:03 devmadhuu

cc: @sumitagrawl

devmadhuu avatar Mar 27 '24 05:03 devmadhuu

@errose28 pls re-review.

devmadhuu avatar Mar 28 '24 05:03 devmadhuu

@sumitagrawl kindly re-review.

devmadhuu avatar Apr 03 '24 18:04 devmadhuu

@errose28 pls review once.

devmadhuu avatar Apr 24 '24 12:04 devmadhuu

Hi @devmadhuu I probably won't be able to get to this in a timely manner. If others think it looks good no need to wait on my review.

errose28 avatar Apr 24 '24 23:04 errose28