Enable Consistent Data Push for Standalone Segment Push Job Runners
Description:
This PR addresses https://github.com/apache/pinot/issues/9268 for the segment push job runners under the standalone execution framework: SegmentMetadataPushJobRunner, SegmentTarPushJobRunner, and SegmentUriPushJobRunner.
This is accomplished by introducing a new class ConsistentDataPushUtils which contains APIs and helpers for *PushJobRunner(s) to call to invoke the consistent push protocol.
Since there are large overlaps in the code for all of the *PushJobRunner(s), also took this opportunity to refactor and extract the common logic out to BaseSegmentPushJobRunner abstract class.
To enable consistent data push, this PR also introduces a new boolean config in table config under
TableConfig->IngestionConfig->BatchIngestionConfig->consistentDataPush.
Users can enable consistent data push by setting the consistentDataPush config to true as below before invoking ingestion jobs,
...
"batchIngestionConfig": {
"segmentIngestionType": "REFRESH",
"segmentIngestionFrequency": "DAILY",
"consistentDataPush": true
},
...
which will
- In the segment generation phase: inject timestamps to segment names (via segment postfix) in order to prevent segment name conflicts from overwriting existing segments directly for the REFRESH usecase and
- In the segment push/upload phase: wrap the segments replacement protocol around segments upload which will achieve atomic switching between old and new segments data. In the case of failure, revert and abort the swap, which will ensure broker only routes to the old segments.
Testing Done:
Added new test testUploadAndQueryWithConsistentPush in SegmentUploadIntegrationTest, which
-
Runs SegmentMetadataPushJobRunner with consistent push enabled. [] -> [v1 segments]
-
Checks that the segment lineage entry is in expected and completed state.
-
Checks that count stars return expected outputs.
-
Runs SegmentTarPushJobRunner with consistent push enabled. [v1 segments] -> [v2 segments]
-
Checks again that the segment lineage entry is in expected and completed state.
-
Checks again that count stars return expected outputs (that we have successfully bulk replaced the original set of segments).
Codecov Report
Merging #9295 (deae0ca) into master (0f4bcfc) will decrease coverage by
0.03%. The diff coverage is26.96%.
:exclamation: Current head deae0ca differs from pull request most recent head f3aa9fb. Consider uploading reports for the commit f3aa9fb to get more accurate results
@@ Coverage Diff @@
## master #9295 +/- ##
============================================
- Coverage 69.80% 69.76% -0.04%
+ Complexity 4777 4703 -74
============================================
Files 1875 1878 +3
Lines 99860 99930 +70
Branches 15194 15192 -2
============================================
+ Hits 69706 69718 +12
- Misses 25231 25285 +54
- Partials 4923 4927 +4
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | 26.14% <13.31%> (-0.07%) |
:arrow_down: |
| integration2 | 24.84% <1.70%> (-0.15%) |
:arrow_down: |
| unittests1 | 66.99% <18.10%> (-0.10%) |
:arrow_down: |
| unittests2 | 15.28% <1.70%> (+0.02%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...gestion/batch/common/BaseSegmentPushJobRunner.java | 0.00% <0.00%> (ø) |
|
| ...batch/standalone/SegmentMetadataPushJobRunner.java | 0.00% <0.00%> (ø) |
|
| ...tion/batch/standalone/SegmentTarPushJobRunner.java | 0.00% <0.00%> (ø) |
|
| ...tion/batch/standalone/SegmentUriPushJobRunner.java | 0.00% <0.00%> (ø) |
|
| ...t/segment/local/utils/ConsistentDataPushUtils.java | 0.00% <0.00%> (ø) |
|
| ...ingestion/batch/spec/SegmentNameGeneratorSpec.java | 0.00% <0.00%> (ø) |
|
| ...g/apache/pinot/spi/utils/IngestionConfigUtils.java | 69.13% <0.00%> (-5.54%) |
:arrow_down: |
| ...spi/utils/builder/ControllerRequestURLBuilder.java | 0.00% <0.00%> (ø) |
|
| ...i/config/table/ingestion/BatchIngestionConfig.java | 60.00% <71.42%> (+2.85%) |
:arrow_up: |
| ...e/pinot/common/utils/FileUploadDownloadClient.java | 60.27% <74.41%> (+3.45%) |
:arrow_up: |
| ... and 37 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
This is a great feature! Can you please help also update the pinot documentation about this new feature? https://github.com/pinot-contrib/pinot-docs