openhouse
openhouse copied to clipboard
PR2 (nullability bug): adding new OH SparkCatalog which enables preserving non-nullable schemas
Summary
problem: the OpenHouse spark catalog does not preserve non-null fields requested by user dataframes. Because of that, tables are saved with the wrong schema. This problem only affects CTAS
solution: we provide a new sparkcatalog with configuration to enable this in this PR ✅ then we then dictate all spark clients to use this spark catalog 🕐
//old spark client config
spark.sql.defaultExtensions=liopenhouse.relocated.org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,com.linkedin.openhouse.spark.extensions.OpenhouseSparkSessionExtensions
spark.sql.catalog.openhouse=liopenhouse.relocated.org.apache.iceberg.spark.SparkCatalog // this line
spark.sql.catalog.openhouse.catalog-impl=com.linkedin.openhouse.spark.LiOpenHouseCatalog
//new spark client config
spark.sql.defaultExtensions=liopenhouse.relocated.org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,com.linkedin.openhouse.spark.extensions.OpenhouseSparkSessionExtensions
spark.sql.catalog.openhouse=com.linkedin.openhouse.spark.SparkCatalog // this line
spark.sql.catalog.openhouse.catalog-impl=com.linkedin.openhouse.spark.LiOpenHouseCatalog
Changes
- [ ] Client-facing API Changes
- [ ] Internal API Changes
- [X] Bug Fixes
- [X] New Features
- [ ] Performance Improvements
- [ ] Code Style
- [ ] Refactoring
- [ ] Documentation
- [ ] Tests
Testing Done
- [ ] Manually Tested on local docker setup. Please include commands ran, and their output.
- [X] Added new tests for the changes made.
- [X] Updated existing tests to reflect the changes made.
- [ ] No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
- [ ] Some other form of testing like staging or soak time in production. Please explain.
Additional Information
- [ ] Breaking Changes
- [ ] Deprecations
- [ ] Large PR broken into smaller PRs, and PR plan linked in the description.
For all the boxes checked, include additional details of the changes made in this pull request.