[HUDI-7652] Add new `HoodieMergeKey` API to support simple and composite keys
Change Logs
This PR introduces a new class hierarchy for handling merge keys in a more flexible and decoupled manner. It adds the HoodieMergeKey interface, along with two implementations: HoodieSimpleMergeKey and HoodieCompositeMergeKey. This design allows us to extend key-based merge strategies easily.
Motivation
The need for introducing a new merge key handling mechanism was driven by the requirement to support different types of keys (simple and complex) without overloading the existing HoodieKey class, which is central to the write path. By segregating merge key handling into its own hierarchy, we avoid potential conflicts and keep modifications localised, improving the maintainability of the code.
Changes
-
HoodieMergeKey: New API to ensure consistent handling including simple keys and composite keys. It includes methods for retrieving the key and partition path. -
HoodieSimpleMergeKey: WrapsHoodieKeyand implements theHoodieMergeKeyinterface for simple scenarios where the key is a string. -
HoodieCompositeMergeKey: Implements theHoodieMergeKeyinterface but allows for complex types as keys, enhancing flexibility for scenarios where a simple string key is not sufficient. -
HoodieMergeKeyBasedRecordMerger: A new implementation ofHoodieRecordMergerbased onHoodieMergeKey. If the merge keys are of typeHoodieCompositeMergeKey, then it returns the older and newer records. Otherwise, it calls the merge method from the parent class. -
HoodieMergedLogRecordScanner: Changes to merge based onHoodieMergeKey. - Unit tests for the new merger.
These changes do not affect existing functionalities that do not rely on merge keys. It introduces additional classes that are used explicitly for new functionalities involving various key types in merging operations. This ensures minimal to no risk for existing processes.
Impact
Enhancing the flexibility and robustness of our key-based merge strategies. It helps in keeping our codebase scalable and maintainable, allowing easy extensions and modifications in the future.
Risk level (write none, low medium or high below)
low
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
- The config description must be updated if new configs are added or the default value of the configs are changed
- Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the instruction to make changes to the website.
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
@danny0405 Based on our discussion, I have removed the HoodieMergeKey API and created a subclass HoodieMetadataMergedLogRecordScanner which works with ExternalSpillableMap<Serializable, HoodieRecord> (instead of string keys). It is used only for HoodieMetadataLogRecordReader and introduced a HoodieMetadataRecordMerger. The merger currently mimics the super class, but it will change for secondary index in subsequent PR. Please review this PR again.
CI report:
- 399ffffa849a438a764bd16ffae8d6c525de2afc Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azurere-run the last Azure build