hudi icon indicating copy to clipboard operation
hudi copied to clipboard

[HUDI-7841] RLI and secondary index should consider only pruned partitions for file skipping

Open lokeshj1703 opened this issue 1 year ago • 2 comments

Change Logs

Even though RLI scans only matching files, it tries to get those candidate files by iterating over all files from file index. See - https://github.com/apache/hudi/blob/f4be74c29471fbd6afff472f8db292e6b1f16f05/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/RecordLevelIndexSupport.scala#L47

Instead, it can use the prunedPartitionsAndFileSlices to only consider pruned partitions whenever there is a partition predicate.

Impact

NA

Risk level (write none, low medium or high below)

low

Documentation Update

NA

Contributor's checklist

  • [ ] Read through contributor's guide
  • [ ] Change Logs and Impact were stated clearly
  • [ ] Adequate tests were added if applicable
  • [ ] CI passed

lokeshj1703 avatar Jun 11 '24 17:06 lokeshj1703

Please rebase and resolve conflicts.

codope avatar Jul 01 '24 10:07 codope

CI report:

  • 1bfdc8258af9758ce439a683d4c29825a526c763 Azure: SUCCESS
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

hudi-bot avatar Jul 01 '24 15:07 hudi-bot