notify servers that need to move segments to new tiers via SegmentReloadMessage
This PR is part of the work to support multi-datadir for Pinot server as tracked by https://github.com/apache/pinot/issues/8843.
Following up https://github.com/apache/pinot/pull/9598 and https://github.com/apache/pinot/pull/9306, this one has extended TableRebalancer to find out which segments need to change their target tier, and reuse SegmentReloadMessage to notify servers to move segment to target tier locally. Segments that need to relocate to new servers are still handled by the existing logic of table rebalancer (i.e. by updating the segment placement in Helix ideal state). The new logic here only kicks in when all segments are already placed on their ideal servers.
Release Note
controller.segmentRelocator.enableLocalTierMigration - to turn this on/off; disabled by default.
Codecov Report
Merging #9624 (7609a9c) into master (77fa36d) will decrease coverage by
2.19%. The diff coverage is16.66%.
@@ Coverage Diff @@
## master #9624 +/- ##
============================================
- Coverage 28.09% 25.89% -2.20%
+ Complexity 53 44 -9
============================================
Files 1935 1934 -1
Lines 103815 103917 +102
Branches 15758 15770 +12
============================================
- Hits 29162 26913 -2249
- Misses 71780 74297 +2517
+ Partials 2873 2707 -166
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | 25.89% <16.66%> (-0.01%) |
:arrow_down: |
| integration2 | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| .../local/loader/TierBasedSegmentDirectoryLoader.java | 0.00% <0.00%> (ø) |
|
| ...roller/helix/core/relocation/SegmentRelocator.java | 26.42% <10.47%> (-46.55%) |
:arrow_down: |
| ...server/starter/helix/HelixInstanceDataManager.java | 66.25% <20.00%> (-9.09%) |
:arrow_down: |
| ...ntroller/helix/core/PinotHelixResourceManager.java | 39.98% <33.33%> (-3.04%) |
:arrow_down: |
| ...er/starter/helix/SegmentMessageHandlerFactory.java | 59.55% <40.00%> (-6.34%) |
:arrow_down: |
| ...he/pinot/common/messages/SegmentReloadMessage.java | 83.33% <50.00%> (-16.67%) |
:arrow_down: |
| ...apache/pinot/controller/BaseControllerStarter.java | 76.06% <100.00%> (-1.52%) |
:arrow_down: |
| ...va/org/apache/pinot/controller/ControllerConf.java | 49.23% <100.00%> (-2.29%) |
:arrow_down: |
| .../pinot/core/data/manager/BaseTableDataManager.java | 60.00% <100.00%> (-1.27%) |
:arrow_down: |
| .../apache/pinot/server/access/RequesterIdentity.java | 0.00% <0.00%> (-100.00%) |
:arrow_down: |
| ... and 218 more |
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
High level question: why is the logic implemented in the table rebalancer? IMO it should be part of the tier assigner
The design decision was made to extend SegmentRelocator (not Assigner either to separate responsibilities). I extended TableRebalancer (used by SegmentRelocator) to check on a condition i.e.
if (currentAssignment.equals(targetAssignment)) {
LOGGER.info("Table: {} is already balanced", tableNameWithType);
...
But looks like I can check on RebalanceResult.Status.DONE anyway, to move tier migration logic into SegmentRelocator to be clear.
But why do we want to use a separate periodic task to trigger the reload? We should just trigger the reload in SegmentTierAssigner which modify the tier for the segment. Once the SegmentTierAssigner changes the target tier for a segment, the new loaded segment will already be moved, and we want the already loaded segment to have the same behavior to be consistent.
A new target tier can lead to two reactions:
- segment needs to go to a new server (when the target tier is only defined for a specific pool of servers as identified via
serverTagin the tierConfig); - segment is already on the right server, but still needs to go to a new tier, backed by a different datadir.
Action 1 is handled by TableRebalancer; Action 2 is handled by SegmentReloadMessage. Extending SegmentRelocator, it's easy to control the sequence of those two actions, to do Action 2 after Action 1 is completed. So that, we don't send SegmentReloadMessage to a server that's not the ideal server for the segment yet, and it reloads the segment unnecessarily.
It is okay to do it within the SegmentRelocator, but we should also move the logic in SegmentTierAssigner into it. We don't want to use 2 periodic tasks to perform the same operation. Basically after assigning the tier, we should trigger the reload immediately, instead of relying on another periodic task to trigger it.
... Basically after assigning the tier, we should trigger the reload immediately, instead of relying on another periodic task to trigger it.
I think there may be use cases to separate the two: 1) keep SegmentTierAssigner running to calculate target tier continuously; 2) but not run SegmentRelocator task, instead do table rebalance and tier migration manually as planned maintenance based on prod env, like when cluster does not get queries.