Use force commit to reload the consuming segment
We introduced schema evolution for consuming segment with #4954. It only allows adding extra columns with default values, which has several drawbacks:
- Values for the new added columns are ignored
- Other functionalities related to reload cannot be applied (e.g. index changes)
Since we introduced force commit (#9197), we should use it to reload the consuming segment instead
cc @sajjad-moradi
We can't really use forceCommit, right? We will need to do in 3 steps:
- Pause
- Change schema
- Resume
this will ensure that the new schema is pulled when new segments are started.
New schema is always pulled when a new consuming segment is created. We should be able to simply update the schema and run a force commit. The committed segment should be updated with the new schema (same as reloading sealed segment), and the new consuming segment will pick up the new schema
Ah ok
Yes, force commit helps with creating new consuming segments which pull the latest schema. The challenge, however, is to know which completed segments to reload. Since we don't know when all consuming segments finish committing after receiving force commit message, we don't know when to reload the completed segments. One way is to add a sync endpoint for forceCommit. In this endpoint we can first get the list of consuming segments. Then we send force commit messages to servers. Finally we periodically get the list of consuming segments and check if no old consuming segments is in the list. When the check is successful, we return 200.
@sajjad-moradi We don't need to wait for the consuming segment to seal and then reload it since we will load the sealed segment anyway, at which time we will pull the latest schema. For the reload, we can send reload message to the completed segments, and force commit message to the consuming segments. No further step is needed.
I thought we're using the original schema to load the locally built segment, but you're right. We always fetch the latest schema when a segment is going to be loaded.
I talked to @sajjad-moradi on taking this one, I will sync with him and assign to me after the discussion
The basic plan here is to fire forced commit when we invoke reloadMutableSegment() for mutable segment. We will not touch or change the schema evolution change cited above, but this change will render that code a no-op when configured. We also discussed whether to make the use of forced commit here configurable but decided on making it default behavior.
@jugomezv I'm a little bit confused on how you plan to enable this feature. Are you planning to add a config flag for it and make it enabled by default? Long term wise, I want to always use force commit to reload mutable segment. The current way of reloading can cause a lot of unexpected behavior.
+1 on always using force commit to reload. IMO we don't need to have a config for this.
@Jackie-Jiang the plan is to have this as default code path: no config for it. When we call reload on a mutable segement we always do forceCommit on it. Now the only thing we seem to do on reload on mutable segments now is addExtraColumns, but from the comment above even that code seems not needed now as the new consuming segement should adapt the new schema right?
@jugomezv Correct. We should remove the addExtraColumns handling and switch to the force commit based solution.
OK this is done/tested and awaits approval:here
Checking failures in integration tests.
Found this test timing out:
2022-09-30T02:55:09.2664669Z [ERROR] Failures: 2022-09-30T02:55:09.2687993Z [ERROR] ConvertToRawIndexMinionClusterIntegrationTest>HybridClusterIntegrationTest.testReload:185->BaseClusterIntegrationTestSet.testReload:652 Failed to meet condition in 600000ms, error message: Failed to generate default values for new columns
Reviewing the test to see what breaks