Verify if delta kernel can be used for delta conversion source and targets
Feature Request / Improvement
https://github.com/delta-io/delta/tree/master/kernel
Avoid using delta standalone and spark in xtable-core and verify if the delta kernel works.
Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
https://github.com/apache/incubator-xtable/blob/main/xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionSourceProvider.java#L32
Hey,Can you please assign the PR on my name?
Hi @vinishjail97,
I was in the process of doing the changes required for integrating Delta Kernel, and it looks like they cascade across multiple files. To eliminate the dependency on SparkSession, we’ll also need to remove usage of Delta Lake (Spark-based) APIs and replace them with Delta Kernel equivalents. Specifically:
-
DeltaConversionSource.java – Just looking at
getCurrentTablemethod as it uses spark session We need to replace usage of DeltaLog and Snapshot with the corresponding Delta Kernel objects. Moreover we may also need to refactor the code as Snapshot is present in both kernel as well delta code so the references are picked correctly. -
DeltaTableExtractor.java – The
Snapshotclass in Delta Kernel is different in terms of methods from the Spark-based version, which means this file will require significant changes.It need some discovery and a bit more insight on the existing code. e.g snapshot.getMetadata(engine).getPartitionColumns() doesn't exist in delta kernel so we need to figure out such pieces in code.
Let me know if you'd like me to discuss and share your thoughts about the refactoring plan further.