HDDS-10341. Implement a task/framework in Recon to sync full/delta SCM Metadata DB updates at regular intervals.
What changes were proposed in this pull request?
This PR adds task and framework in Recon to do full SCM metadata DB sync or delta updates at regular intervals.
For Recon to have an accurate and updated information related to various states of containers, blocks pending for deletion or to process any other SCM metadata information in Recon, it is important to have full SCM DB snapshot sync at startup of Recon or fall back on full SCM DB snapshot in failure scenarios like while applying delta updates from SCM metadata DB.
There are some gaps where Recon may not know of updates in SCM metadata. E.g. Recon may be down and during downtime, some containers may get created and deleted in SCM, so Recon will never get to know about such containers even when it is started again.
If we implement delta updates and during Recon downtime, this sync task will try to sync SCM DB delta updates since its rocks DB last sequence number and apply all delta updates to various SCM tasks in Recon.
This PR also re-initialize SequenceIdGenerator, ReconPipelineManager and ReconContainerManager in-memory data of pipelines, containers on full scm DB snapshot and keeps their in-memory data upto date with incremental/delta updates of SCM metadata DB in Recon.
Another change is done to separate out NODES table from SCM snapshot DB as with every new node registration or updating the node state, ReconNodeManager was changing the sequence number of Recon SCM snapshot DB which was creating problem when Recon ask for delta updates with increased sequence number of same snapshot DB.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10341
How was this patch tested?
Ran existing and new Junit based integration tests.
@dombizita @ArafatKhan2198 @sumitagrawl Kindly review.
Thanks for working on this @devmadhuu I have a few doubts and comments
In Recon, we previously executed RPC calls to SCM to fetch the latest information, such as with the
pipelineSyncTask. Will these tasks continue to make sync calls, or will they rely on the SCM delta updates?
Yes, as of now we are continuing to run these tasks because many times being observed that even after running of these tasks, we have mis-match in info, so current SCM DB sync is superlative and sync up everything and normalize any mis-matches.
@sadanand48 Kindly review.
Thanks @sadanand48 for reviewing the patch. Kindly re-review.
@devmadhuu We have an Integration test written for Om-Recon syncTestReconWithOzoneManager should we add one for SCM as well that performs similar tests?
@devmadhuu We have an Integration test written for Om-Recon sync
TestReconWithOzoneManagershould we add one for SCM as well that performs similar tests?
I have added already integration test cases in TestScmSnapshot class, Pls check if you want me to add any more in that.
The SCMDBMetaDataInitializationTask class is responsible for processing delta updates from the SCM database and applying them to the Recon component's in-memory data structures. It focuses specifically on three tables: CONTAINERS_TABLE for container metadata, PIPELINES_TABLE for pipeline metadata, and SEQUENCE_ID_TABLE for sequence IDs used in the Ozone cluster. By keeping these tables up-to-date, the task ensures that Recon maintains accurate information about containers, pipelines, and sequence IDs.
Hence apart from these none of the other tables in the SCM snapshot are touched.
@devmadhuu am I correct on this?
The
SCMDBMetaDataInitializationTaskclass is responsible for processing delta updates from theSCMdatabase and applying them to the Recon component's in-memory data structures. It focuses specifically on three tables:CONTAINERS_TABLEfor container metadata,PIPELINES_TABLEfor pipeline metadata, andSEQUENCE_ID_TABLEfor sequence IDs used in the Ozone cluster. By keeping these tables up-to-date, the task ensures that Recon maintains accurate information about containers, pipelines, and sequence IDs. Hence apart from these none of the other tables in the SCM snapshot are touched.@devmadhuu am I correct on this?
Yes, so in other words, this PR is a framework kind of change, and PR objective is to maintain what we were doing earlier and updating in memory structures using FCR/ICR for containers, nodes, pipelines. So only these tables . This framework is designed to be flexible and extend for any other SCM table if we want in future
/pending
Thank you very much for the patch. I am closing this PR temporarily as there was no activity recently and it is waiting for response from its author.
It doesn't mean that this PR is not important or ignored: feel free to reopen the PR at any time.
It only means that attention of committers is not required. We prefer to keep the review queue clean. This ensures PRs in need of review are more visible, which results in faster feedback for all PRs.
If you need ANY help to finish this PR, please contact the community on the mailing list or the slack channel."