pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Enable Consistent Data Push for Standalone Segment Push Job Runners

Open yuanbenson opened this issue 3 years ago • 1 comments

Description:

This PR addresses https://github.com/apache/pinot/issues/9268 for the segment push job runners under the standalone execution framework: SegmentMetadataPushJobRunner, SegmentTarPushJobRunner, and SegmentUriPushJobRunner.

This is accomplished by introducing a new class ConsistentDataPushUtils which contains APIs and helpers for *PushJobRunner(s) to call to invoke the consistent push protocol.

Since there are large overlaps in the code for all of the *PushJobRunner(s), also took this opportunity to refactor and extract the common logic out to BaseSegmentPushJobRunner abstract class.

To enable consistent data push, this PR also introduces a new boolean config in table config under TableConfig->IngestionConfig->BatchIngestionConfig->consistentDataPush.

Users can enable consistent data push by setting the consistentDataPush config to true as below before invoking ingestion jobs,

...
    "batchIngestionConfig": {
      "segmentIngestionType": "REFRESH",
      "segmentIngestionFrequency": "DAILY",
      "consistentDataPush": true
    },
...

which will

  1. In the segment generation phase: inject timestamps to segment names (via segment postfix) in order to prevent segment name conflicts from overwriting existing segments directly for the REFRESH usecase and
  2. In the segment push/upload phase: wrap the segments replacement protocol around segments upload which will achieve atomic switching between old and new segments data. In the case of failure, revert and abort the swap, which will ensure broker only routes to the old segments.

Testing Done:

Added new test testUploadAndQueryWithConsistentPush in SegmentUploadIntegrationTest, which

  1. Runs SegmentMetadataPushJobRunner with consistent push enabled. [] -> [v1 segments]

  2. Checks that the segment lineage entry is in expected and completed state.

  3. Checks that count stars return expected outputs.

  4. Runs SegmentTarPushJobRunner with consistent push enabled. [v1 segments] -> [v2 segments]

  5. Checks again that the segment lineage entry is in expected and completed state.

  6. Checks again that count stars return expected outputs (that we have successfully bulk replaced the original set of segments).

yuanbenson avatar Aug 29 '22 21:08 yuanbenson

Codecov Report

Merging #9295 (deae0ca) into master (0f4bcfc) will decrease coverage by 0.03%. The diff coverage is 26.96%.

:exclamation: Current head deae0ca differs from pull request most recent head f3aa9fb. Consider uploading reports for the commit f3aa9fb to get more accurate results

@@             Coverage Diff              @@
##             master    #9295      +/-   ##
============================================
- Coverage     69.80%   69.76%   -0.04%     
+ Complexity     4777     4703      -74     
============================================
  Files          1875     1878       +3     
  Lines         99860    99930      +70     
  Branches      15194    15192       -2     
============================================
+ Hits          69706    69718      +12     
- Misses        25231    25285      +54     
- Partials       4923     4927       +4     
Flag Coverage Δ
integration1 26.14% <13.31%> (-0.07%) :arrow_down:
integration2 24.84% <1.70%> (-0.15%) :arrow_down:
unittests1 66.99% <18.10%> (-0.10%) :arrow_down:
unittests2 15.28% <1.70%> (+0.02%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...gestion/batch/common/BaseSegmentPushJobRunner.java 0.00% <0.00%> (ø)
...batch/standalone/SegmentMetadataPushJobRunner.java 0.00% <0.00%> (ø)
...tion/batch/standalone/SegmentTarPushJobRunner.java 0.00% <0.00%> (ø)
...tion/batch/standalone/SegmentUriPushJobRunner.java 0.00% <0.00%> (ø)
...t/segment/local/utils/ConsistentDataPushUtils.java 0.00% <0.00%> (ø)
...ingestion/batch/spec/SegmentNameGeneratorSpec.java 0.00% <0.00%> (ø)
...g/apache/pinot/spi/utils/IngestionConfigUtils.java 69.13% <0.00%> (-5.54%) :arrow_down:
...spi/utils/builder/ControllerRequestURLBuilder.java 0.00% <0.00%> (ø)
...i/config/table/ingestion/BatchIngestionConfig.java 60.00% <71.42%> (+2.85%) :arrow_up:
...e/pinot/common/utils/FileUploadDownloadClient.java 60.27% <74.41%> (+3.45%) :arrow_up:
... and 37 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Aug 30 '22 23:08 codecov-commenter

This is a great feature! Can you please help also update the pinot documentation about this new feature? https://github.com/pinot-contrib/pinot-docs

Jackie-Jiang avatar Sep 07 '22 21:09 Jackie-Jiang