Implement TTL support for Pinot upsert
Apache Pinot provides native support of Upsert since v0.6.0 (#4261), it allows users to modify existing records, and successfully onboard many use cases. We observed Pinot upsert clusters usually have high usage of heap memory. This is because the upsert metadata (primaryKeyIndexes and validDocIndexes), are stored in heap of pinot hosts. For use cases with high cardinality of primary keys, the heap usage of these upsert tables usually becomes the bottleneck of the hardware resource.
For some use cases, records that shared primary keys will get updates frequently during a time window, and after the time window, these records won’t get updated any more. In these use cases, each primary key has a lifecycle and will be deactivated after the time window. Currently these primary keys won’t expire until the retention days, and they will be kept in primaryKeyIndexes. We shall introduce TTL (time-to-live) for Pinot primary keys. Primary keys will expire after the TTL, and we can remove inactive keys from upsert metadata to save heap space.
Few Challenges that we want to solve.
- snapshots management for validDocIndexes
- implement TTL for primary keys in primaryKeyIndexes
- snapshot backup in the deepstore.
We summarized the challenges and thoughts for partial upsert in this design
Please review cc @Jackie-Jiang @chenboat @yupeng9
After discussion with @Jackie-Jiang @yupeng9 @chenboat
We can break down the feature into the following part.
- Design doc updates
- part 1. When committing segment, update replaceSegment to clean up keys
- part 1.1 clean up keys in primary key indexes
- part 1.2 generate snapshot locally
- part 1.3 [Deepstore] upload snapshot to Deepstore
- part 2. periodic job in pinot controller (upload snapshot if not persisted)
- part 3. add a download snapshot api on the server side.
- part 4. when loading segments, get snapshot to avoid re-compute
- part 4.1 get snapshot from peer server
- part 4.2 [Deepstore] get snapshot from Deepstore
Thanks for summarizing it. Part 1.3 is not required. Controller will ask server for the snapshot and then controller is responsible for the snapshot upload
The POC was done in #10047 however there are unhandled corner cases. These corner cases was addressed in #10915