hudi
hudi copied to clipboard
[HUDI-7841] RLI and secondary index should consider only pruned partitions for file skipping
Change Logs
Even though RLI scans only matching files, it tries to get those candidate files by iterating over all files from file index. See - https://github.com/apache/hudi/blob/f4be74c29471fbd6afff472f8db292e6b1f16f05/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/RecordLevelIndexSupport.scala#L47
Instead, it can use the prunedPartitionsAndFileSlices to only consider pruned partitions whenever there is a partition predicate.
Impact
NA
Risk level (write none, low medium or high below)
low
Documentation Update
NA
Contributor's checklist
- [ ] Read through contributor's guide
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
Please rebase and resolve conflicts.
CI report:
- 1bfdc8258af9758ce439a683d4c29825a526c763 Azure: SUCCESS
Bot commands
@hudi-bot supports the following commands:-
@hudi-bot run azurere-run the last Azure build