pinot icon indicating copy to clipboard operation
pinot copied to clipboard

notify servers that need to move segments to new tiers via SegmentReloadMessage

Open klsince opened this issue 3 years ago • 6 comments

This PR is part of the work to support multi-datadir for Pinot server as tracked by https://github.com/apache/pinot/issues/8843.

Following up https://github.com/apache/pinot/pull/9598 and https://github.com/apache/pinot/pull/9306, this one has extended TableRebalancer to find out which segments need to change their target tier, and reuse SegmentReloadMessage to notify servers to move segment to target tier locally. Segments that need to relocate to new servers are still handled by the existing logic of table rebalancer (i.e. by updating the segment placement in Helix ideal state). The new logic here only kicks in when all segments are already placed on their ideal servers.

Release Note

controller.segmentRelocator.enableLocalTierMigration - to turn this on/off; disabled by default.

klsince avatar Oct 18 '22 21:10 klsince

Codecov Report

Merging #9624 (7609a9c) into master (77fa36d) will decrease coverage by 2.19%. The diff coverage is 16.66%.

@@             Coverage Diff              @@
##             master    #9624      +/-   ##
============================================
- Coverage     28.09%   25.89%   -2.20%     
+ Complexity       53       44       -9     
============================================
  Files          1935     1934       -1     
  Lines        103815   103917     +102     
  Branches      15758    15770      +12     
============================================
- Hits          29162    26913    -2249     
- Misses        71780    74297    +2517     
+ Partials       2873     2707     -166     
Flag Coverage Δ
integration1 25.89% <16.66%> (-0.01%) :arrow_down:
integration2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../local/loader/TierBasedSegmentDirectoryLoader.java 0.00% <0.00%> (ø)
...roller/helix/core/relocation/SegmentRelocator.java 26.42% <10.47%> (-46.55%) :arrow_down:
...server/starter/helix/HelixInstanceDataManager.java 66.25% <20.00%> (-9.09%) :arrow_down:
...ntroller/helix/core/PinotHelixResourceManager.java 39.98% <33.33%> (-3.04%) :arrow_down:
...er/starter/helix/SegmentMessageHandlerFactory.java 59.55% <40.00%> (-6.34%) :arrow_down:
...he/pinot/common/messages/SegmentReloadMessage.java 83.33% <50.00%> (-16.67%) :arrow_down:
...apache/pinot/controller/BaseControllerStarter.java 76.06% <100.00%> (-1.52%) :arrow_down:
...va/org/apache/pinot/controller/ControllerConf.java 49.23% <100.00%> (-2.29%) :arrow_down:
.../pinot/core/data/manager/BaseTableDataManager.java 60.00% <100.00%> (-1.27%) :arrow_down:
.../apache/pinot/server/access/RequesterIdentity.java 0.00% <0.00%> (-100.00%) :arrow_down:
... and 218 more

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov-commenter avatar Oct 18 '22 22:10 codecov-commenter

High level question: why is the logic implemented in the table rebalancer? IMO it should be part of the tier assigner

The design decision was made to extend SegmentRelocator (not Assigner either to separate responsibilities). I extended TableRebalancer (used by SegmentRelocator) to check on a condition i.e.

if (currentAssignment.equals(targetAssignment)) {
      LOGGER.info("Table: {} is already balanced", tableNameWithType);
...

But looks like I can check on RebalanceResult.Status.DONE anyway, to move tier migration logic into SegmentRelocator to be clear.

klsince avatar Oct 18 '22 23:10 klsince

But why do we want to use a separate periodic task to trigger the reload? We should just trigger the reload in SegmentTierAssigner which modify the tier for the segment. Once the SegmentTierAssigner changes the target tier for a segment, the new loaded segment will already be moved, and we want the already loaded segment to have the same behavior to be consistent.

Jackie-Jiang avatar Oct 19 '22 23:10 Jackie-Jiang

A new target tier can lead to two reactions:

  1. segment needs to go to a new server (when the target tier is only defined for a specific pool of servers as identified via serverTag in the tierConfig);
  2. segment is already on the right server, but still needs to go to a new tier, backed by a different datadir.

Action 1 is handled by TableRebalancer; Action 2 is handled by SegmentReloadMessage. Extending SegmentRelocator, it's easy to control the sequence of those two actions, to do Action 2 after Action 1 is completed. So that, we don't send SegmentReloadMessage to a server that's not the ideal server for the segment yet, and it reloads the segment unnecessarily.

klsince avatar Oct 19 '22 23:10 klsince

It is okay to do it within the SegmentRelocator, but we should also move the logic in SegmentTierAssigner into it. We don't want to use 2 periodic tasks to perform the same operation. Basically after assigning the tier, we should trigger the reload immediately, instead of relying on another periodic task to trigger it.

Jackie-Jiang avatar Oct 20 '22 00:10 Jackie-Jiang

... Basically after assigning the tier, we should trigger the reload immediately, instead of relying on another periodic task to trigger it.

I think there may be use cases to separate the two: 1) keep SegmentTierAssigner running to calculate target tier continuously; 2) but not run SegmentRelocator task, instead do table rebalance and tier migration manually as planned maintenance based on prod env, like when cluster does not get queries.

klsince avatar Oct 20 '22 18:10 klsince