openhouse
openhouse copied to clipboard
[PR1 - refactor] Introduce branching and support for spark.wap.branch
Summary
This PR extracts snapshot handling logic from OpenHouseInternalTableOperations into a dedicated SnapshotDiffApplier class. The refactoring improves code organization and maintainability while preserving all existing behavior.
Changes
- [ ] Client-facing API Changes
- [ ] Internal API Changes
- [ ] Bug Fixes
- [ ] New Features
- [ ] Performance Improvements
- [ ] Code Style
- [x] Refactoring
- [ ] Documentation
- [ ] Tests
Refactoring:
- Created new SnapshotDiffApplier service class responsible for applying snapshot changes to Iceberg table metadata
- Migrated snapshot logic from OpenHouseInternalTableOperations to SnapshotDiffApplier
- Migrated validation logic from SnapshotInspector.validateSnapshotsUpdate()
- Implemented state object pattern (SnapshotDiff) that computes and caches all snapshot analysis upfront
- Clear flow: parse input → compute diff → validate → apply → record metrics
Snapshot categorization:
- Staged (WAP) snapshots: Contain STAGED_WAP_ID_PROP in summary, added without branch reference
- Cherry-picked snapshots: Existing snapshots referenced as SOURCE_SNAPSHOT_ID_PROP by another snapshot, or WAP snapshots transitioning from staged to published
- Regular snapshots: All new snapshots that are not staged WAP snapshots
- Only EXISTING snapshots are considered as cherry-picked sources
Validation:
- Prevents deleting the current snapshot without providing replacement snapshots
- Validates only MAIN branch is supported (existing OpenHouse behavior)
Testing Done
- [ ] Manually Tested on local docker setup. Please include commands ran, and their output.
- [ ] Added new tests for the changes made.
- [ ] Updated existing tests to reflect the changes made.
- [x] No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
- [ ] Some other form of testing like staging or soak time in production. Please explain.
This is a pure refactoring that extracts existing logic into a new class without changing behavior. All existing tests should continue to pass as-is since the functionality remains identical. The extracted logic maintains the same validation rules, snapshot categorization, and metrics recording as before.
Additional Information
- [ ] Breaking Changes
- [ ] Deprecations
- [ ] Large PR broken into smaller PRs, and PR plan linked in the description.