pinot Improved segment build time for Lucene text index realtime to offline conversion

Problem: We typically see long (7-10min) segment build times when using Lucene index with 1-1.5GB segment sizes. 70-80% of this time is spent building the Lucene text index.

Background: In the existing implementation the Lucene index stores Pinot docIds: for the mutable segment these are the mutable segment's docIds, for the immutable segment we store each row with its new immutable segment docId. Lucene queries return the matching Lucene DocIds, and we compute these on the fly for the mutable index, or from a mapping file for the immutable index.

Change Summary: This change copies the mutable Lucene index during realtime segment conversion to reuse, instead of building a new Lucene index. To prepare for copying, a commit() method is added to the MutableSegment interface. To handle the potential docId change, sortedDocIds is added to IndexCreationContext to compute a temporary mapping between the mutable docId and the immutable segment's docId. This temporary mapping is used during segment conversion to build the mapping file between the Lucene docId and the new immutable segment's docId. This mapping file is built during segment conversion, instead of during segment load in the traditional path.

Internally we've seen roughly 40-60% improvement in overall segment build time. The lower peaks are from a table/tenant with this change, the higher ingestion delay peaks are from an identical table in a tenant without this change:

Testing: deployed internally, local testing, validated basic pause/restart/reload operations on a table to ensure no regression in TextIndexHandler index build.

tags: ingestion performance

Mar 28 '24 22:03 itschrispeck

Codecov Report

Attention: Patch coverage is 86.20690% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 62.24%. Comparing base (59551e4) to head (2bddff1). Report is 298 commits behind head on master.

Files	Patch %	Lines
...ment/creator/impl/text/LuceneTextIndexCreator.java	86.95%	7 Missing and 2 partials :warning:
...me/impl/invertedindex/RealtimeLuceneTextIndex.java	42.85%	4 Missing :warning:
...ot/segment/spi/creator/SegmentGeneratorConfig.java	50.00%	2 Missing :warning:
.../pinot/segment/spi/index/mutable/MutableIndex.java	0.00%	1 Missing :warning:

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #12744      +/-   ##
============================================
+ Coverage     61.75%   62.24%   +0.49%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2503      +67     
  Lines        133233   136497    +3264     
  Branches      20636    21117     +481     
============================================
+ Hits          82274    84964    +2690     
- Misses        44911    45248     +337     
- Partials       6048     6285     +237

Flag	Coverage Δ
custom-integration1	`<0.01% <0.00%> (-0.01%)`	:arrow_down:
integration	`<0.01% <0.00%> (-0.01%)`	:arrow_down:
integration1	`<0.01% <0.00%> (-0.01%)`	:arrow_down:
integration2	`0.00% <0.00%> (ø)`
java-11	`62.18% <86.20%> (+0.47%)`	:arrow_up:
java-21	`62.10% <86.20%> (+0.48%)`	:arrow_up:
skip-bytebuffers-false	`62.22% <86.20%> (+0.47%)`	:arrow_up:
skip-bytebuffers-true	`62.08% <86.20%> (+34.35%)`	:arrow_up:
temurin	`62.24% <86.20%> (+0.49%)`	:arrow_up:
unittests	`62.24% <86.20%> (+0.49%)`	:arrow_up:
unittests1	`46.70% <10.34%> (-0.19%)`	:arrow_down:
unittests2	`28.02% <75.86%> (+0.29%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 28 '24 23:03 codecov-commenter

hi @itschrispeck is there a feature flag to disable this optimization?

We found the heap usage got higher than before after upgrading to recent code. From server logs, we found "Reusing the realtime lucene index for segment" which led me here. And from heap dump, we found lucene.index.SegmentCommitInfo was a top consumer of heap space, and they were mainly referenced by RealtimeLuceneTextIndex.

num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:       5481198      686979272  [B ([email protected])
   2:       9944219      477322512  java.util.HashMap ([email protected])
   3:       4496967      467684568  org.apache.lucene.index.SegmentCommitInfo
   4:       8873625      283956000  java.util.HashMap$Node ([email protected])
...

We didn't config TextIndex for those tables across the upgrades, so I'd assume this optimization was enabled by default. So I'd like to check if there is a feature flag to disable this, so we can validate if this had caused the higher heap usage.

If there is no such feature flag, I can try to add one for your review. I'm thinking to add one around here

// Optimization for realtime segment conversion
    if (dataSource instanceof RealtimeSegmentSegmentCreationDataSource) { <--- looks like hard coded to enable this optimization when committing mutable segment
      _config.setRealtimeConversion(true);
      _config.setConsumerDir(((RealtimeSegmentSegmentCreationDataSource) dataSource).getConsumerDir());
    }

cc @Jackie-Jiang

Jul 09 '24 22:07 klsince