openhouse icon indicating copy to clipboard operation
openhouse copied to clipboard

bumping max se versions and age

Open cbb330 opened this issue 8 months ago • 0 comments

Summary

Problem: The current CDC (Change Data Capture) benchmark process requires streaming data into OpenHouse for periods up to 24 hours. After the stream is manually stopped (sometime after the 24-hour mark), operators need to perform two key actions:

  1. Roll back the table to specific time-window checkpoints (e.g., 24hr, 12hr, 6hr).
  2. Perform further rollbacks to these checkpoints after executing additional operations, such as compaction, on the windowed dataset. The primary challenge is that routine snapshot expiration can delete the historical data versions required for these rollbacks. If snapshots expire too soon, rolling back becomes impossible. This necessitates a complete re-ingestion of the data, which can delay testing by up to 24 hours.

Solution: To ensure that necessary snapshots are retained for the duration of the CDC benchmark and subsequent testing, we are increasing a configuration related to snapshot retention (referred to as the "ceiling") to 900. This new value is derived as follows:

  • The CDC benchmark streams commits at a frequency of one commit every 5 minutes.
  • We need to ensure data is available for up to 3 days to allow for the manual stopping of the stream and the completion of all rollback and testing procedures.

Calculation: (24 hours/day * 60 minutes/hour / 5 minutes/commit) * 3 days = 288 commits/day * 3 days = 864 commits Setting the ceiling to 900 provides a sufficient buffer above the calculated 864 commits. This prevents premature snapshot expiration, allowing operators to reliably perform rollbacks as required by the benchmark, thus avoiding lengthy data re-ingestion delays.

In addition, the max_age will take precedence over a max version, so 900 versions will still be limited to 3 days. so we must also bump the max_age to 15 days.

as an aside, making these parameters static constants for clarity.

Changes

  • [ ] Client-facing API Changes
  • [X] Internal API Changes
  • [ ] Bug Fixes
  • [ ] New Features
  • [ ] Performance Improvements
  • [ ] Code Style
  • [ ] Refactoring
  • [ ] Documentation
  • [ ] Tests

Testing Done

  • [ ] Manually Tested on local docker setup. Please include commands ran, and their output.
  • [ ] Added new tests for the changes made.
  • [ ] Updated existing tests to reflect the changes made.
  • [X] No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • [ ] Some other form of testing like staging or soak time in production. Please explain.

Relying on existing unittests

Additional Information

  • [ ] Breaking Changes
  • [ ] Deprecations
  • [ ] Large PR broken into smaller PRs, and PR plan linked in the description.

cbb330 avatar May 14 '25 00:05 cbb330