Improved segment build time for Lucene text index realtime to offline conversion
Problem: We typically see long (7-10min) segment build times when using Lucene index with 1-1.5GB segment sizes. 70-80% of this time is spent building the Lucene text index.
Background: In the existing implementation the Lucene index stores Pinot docIds: for the mutable segment these are the mutable segment's docIds, for the immutable segment we store each row with its new immutable segment docId. Lucene queries return the matching Lucene DocIds, and we compute these on the fly for the mutable index, or from a mapping file for the immutable index.
Change Summary:
This change copies the mutable Lucene index during realtime segment conversion to reuse, instead of building a new Lucene index. To prepare for copying, a commit() method is added to the MutableSegment interface. To handle the potential docId change, sortedDocIds is added to IndexCreationContext to compute a temporary mapping between the mutable docId and the immutable segment's docId. This temporary mapping is used during segment conversion to build the mapping file between the Lucene docId and the new immutable segment's docId. This mapping file is built during segment conversion, instead of during segment load in the traditional path.
Internally we've seen roughly 40-60% improvement in overall segment build time. The lower peaks are from a table/tenant with this change, the higher ingestion delay peaks are from an identical table in a tenant without this change:
Testing: deployed internally, local testing, validated basic pause/restart/reload operations on a table to ensure no regression in TextIndexHandler index build.
tags: ingestion performance
Codecov Report
Attention: Patch coverage is 86.20690% with 16 lines in your changes are missing coverage. Please review.
Project coverage is 62.24%. Comparing base (
59551e4) to head (2bddff1). Report is 298 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #12744 +/- ##
============================================
+ Coverage 61.75% 62.24% +0.49%
+ Complexity 207 198 -9
============================================
Files 2436 2503 +67
Lines 133233 136497 +3264
Branches 20636 21117 +481
============================================
+ Hits 82274 84964 +2690
- Misses 44911 45248 +337
- Partials 6048 6285 +237
| Flag | Coverage Δ | |
|---|---|---|
| custom-integration1 | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration1 | <0.01% <0.00%> (-0.01%) |
:arrow_down: |
| integration2 | 0.00% <0.00%> (ø) |
|
| java-11 | 62.18% <86.20%> (+0.47%) |
:arrow_up: |
| java-21 | 62.10% <86.20%> (+0.48%) |
:arrow_up: |
| skip-bytebuffers-false | 62.22% <86.20%> (+0.47%) |
:arrow_up: |
| skip-bytebuffers-true | 62.08% <86.20%> (+34.35%) |
:arrow_up: |
| temurin | 62.24% <86.20%> (+0.49%) |
:arrow_up: |
| unittests | 62.24% <86.20%> (+0.49%) |
:arrow_up: |
| unittests1 | 46.70% <10.34%> (-0.19%) |
:arrow_down: |
| unittests2 | 28.02% <75.86%> (+0.29%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
hi @itschrispeck is there a feature flag to disable this optimization?
We found the heap usage got higher than before after upgrading to recent code. From server logs, we found "Reusing the realtime lucene index for segment" which led me here. And from heap dump, we found lucene.index.SegmentCommitInfo was a top consumer of heap space, and they were mainly referenced by RealtimeLuceneTextIndex.
num #instances #bytes class name (module)
-------------------------------------------------------
1: 5481198 686979272 [B ([email protected])
2: 9944219 477322512 java.util.HashMap ([email protected])
3: 4496967 467684568 org.apache.lucene.index.SegmentCommitInfo
4: 8873625 283956000 java.util.HashMap$Node ([email protected])
...
We didn't config TextIndex for those tables across the upgrades, so I'd assume this optimization was enabled by default. So I'd like to check if there is a feature flag to disable this, so we can validate if this had caused the higher heap usage.
If there is no such feature flag, I can try to add one for your review. I'm thinking to add one around here
// Optimization for realtime segment conversion
if (dataSource instanceof RealtimeSegmentSegmentCreationDataSource) { <--- looks like hard coded to enable this optimization when committing mutable segment
_config.setRealtimeConversion(true);
_config.setConsumerDir(((RealtimeSegmentSegmentCreationDataSource) dataSource).getConsumerDir());
}
cc @Jackie-Jiang