openhouse icon indicating copy to clipboard operation
openhouse copied to clipboard

[PR1 - refactor] Introduce branching and support for spark.wap.branch

Open cbb330 opened this issue 3 months ago • 0 comments

Summary

This PR extracts snapshot handling logic from OpenHouseInternalTableOperations into a dedicated SnapshotDiffApplier class. The refactoring improves code organization and maintainability while preserving all existing behavior.

Changes

  • [ ] Client-facing API Changes
  • [ ] Internal API Changes
  • [ ] Bug Fixes
  • [ ] New Features
  • [ ] Performance Improvements
  • [ ] Code Style
  • [x] Refactoring
  • [ ] Documentation
  • [ ] Tests

Refactoring:

  • Created new SnapshotDiffApplier service class responsible for applying snapshot changes to Iceberg table metadata
  • Migrated snapshot logic from OpenHouseInternalTableOperations to SnapshotDiffApplier
  • Migrated validation logic from SnapshotInspector.validateSnapshotsUpdate()
  • Implemented state object pattern (SnapshotDiff) that computes and caches all snapshot analysis upfront
  • Clear flow: parse input → compute diff → validate → apply → record metrics

Snapshot categorization:

  • Staged (WAP) snapshots: Contain STAGED_WAP_ID_PROP in summary, added without branch reference
  • Cherry-picked snapshots: Existing snapshots referenced as SOURCE_SNAPSHOT_ID_PROP by another snapshot, or WAP snapshots transitioning from staged to published
  • Regular snapshots: All new snapshots that are not staged WAP snapshots
  • Only EXISTING snapshots are considered as cherry-picked sources

Validation:

  • Prevents deleting the current snapshot without providing replacement snapshots
  • Validates only MAIN branch is supported (existing OpenHouse behavior)

Testing Done

  • [ ] Manually Tested on local docker setup. Please include commands ran, and their output.
  • [ ] Added new tests for the changes made.
  • [ ] Updated existing tests to reflect the changes made.
  • [x] No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • [ ] Some other form of testing like staging or soak time in production. Please explain.

This is a pure refactoring that extracts existing logic into a new class without changing behavior. All existing tests should continue to pass as-is since the functionality remains identical. The extracted logic maintains the same validation rules, snapshot categorization, and metrics recording as before.

Additional Information

  • [ ] Breaking Changes
  • [ ] Deprecations
  • [ ] Large PR broken into smaller PRs, and PR plan linked in the description.

cbb330 avatar Nov 04 '25 02:11 cbb330