pinot icon indicating copy to clipboard operation
pinot copied to clipboard

Allow batch segments to have invalid start/end time

Open KKcorps opened this issue 3 years ago • 1 comments

Currently, we simply fail the batch ingestion process if segments if start/end time is outside of time-range or values are not parsed correctly. This is not needed now since we have a new API from PR #9413 So users should be able to ingest segments and fix their time interval later on.

This behaviour is disabled by default and can only be switched with segmentTimeValueCheck: false flag for a table.

This is being done in the following way -

  • If there is an error in parsing time values during Segment creation for start/end time, we simply don't store the START/END time in the metadata.
  • The ZkMetadata simply returns -1 as start/end time in this case.
  • All the places which call SegmentZKMetadata.getStartTimeMs or SegmentZKMetadata.getEndTimeMs handle the negative timestamps. Either an exception is thrown that is already handled (e.g. TimeBasedTierSegmentSelector.selectSegmentOR we ignore the segments (e.g.MergeRollupTask`)
  • No change made to OfflineSegmentIntervalChecker because we need logs for segments with invalid time intervals so that user can fix them

KKcorps avatar Sep 30 '22 11:09 KKcorps

Codecov Report

Merging #9505 (ad613ea) into master (b026d32) will decrease coverage by 0.04%. The diff coverage is 70.00%.

@@             Coverage Diff              @@
##             master    #9505      +/-   ##
============================================
- Coverage     70.00%   69.96%   -0.05%     
- Complexity     4792     4856      +64     
============================================
  Files          1921     1943      +22     
  Lines        102349   103924    +1575     
  Branches      15530    15748     +218     
============================================
+ Hits          71651    72707    +1056     
- Misses        25636    26116     +480     
- Partials       5062     5101      +39     
Flag Coverage Δ
integration1 25.87% <22.50%> (-0.04%) :arrow_down:
integration2 24.33% <25.00%> (-0.40%) :arrow_down:
unittests1 67.31% <64.28%> (+<0.01%) :arrow_up:
unittests2 15.70% <17.50%> (+0.10%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...on/tasks/mergerollup/MergeRollupTaskGenerator.java 84.85% <0.00%> (ø)
...ment/creator/impl/SegmentColumnarIndexCreator.java 79.77% <62.96%> (+0.83%) :arrow_up:
...gments/RealtimeToOfflineSegmentsTaskGenerator.java 94.20% <66.66%> (-0.66%) :arrow_down:
...ot/controller/helix/core/util/ZKMetadataUtils.java 92.30% <100.00%> (+0.12%) :arrow_up:
...ot/spi/config/table/ingestion/IngestionConfig.java 100.00% <100.00%> (ø)
...ransformer/TransformFunctionRecordTransformer.java 0.00% <0.00%> (-100.00%) :arrow_down:
...ent/creator/IntermediateSegmentStatsContainer.java 0.00% <0.00%> (-100.00%) :arrow_down:
...rocessing/transformer/RecordTransformerConfig.java 0.00% <0.00%> (-88.89%) :arrow_down:
...pache/pinot/core/query/executor/QueryExecutor.java 12.50% <0.00%> (-87.50%) :arrow_down:
...ntroller/helix/core/minion/TaskMetricsEmitter.java 34.88% <0.00%> (-51.17%) :arrow_down:
... and 321 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Sep 30 '22 22:09 codecov-commenter