incubator-xtable icon indicating copy to clipboard operation
incubator-xtable copied to clipboard

Verify if delta kernel can be used for delta conversion source and targets

Open vinishjail97 opened this issue 8 months ago • 3 comments

Feature Request / Improvement

https://github.com/delta-io/delta/tree/master/kernel

Avoid using delta standalone and spark in xtable-core and verify if the delta kernel works.

Are you willing to submit PR?

  • [x] Yes I am willing to submit a PR!

Code of Conduct

vinishjail97 avatar May 10 '25 01:05 vinishjail97

https://github.com/apache/incubator-xtable/blob/main/xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionSourceProvider.java#L32

vinishjail97 avatar May 10 '25 01:05 vinishjail97

Hey,Can you please assign the PR on my name?

vaibhavk1992 avatar May 10 '25 01:05 vaibhavk1992

Hi @vinishjail97, I was in the process of doing the changes required for integrating Delta Kernel, and it looks like they cascade across multiple files. To eliminate the dependency on SparkSession, we’ll also need to remove usage of Delta Lake (Spark-based) APIs and replace them with Delta Kernel equivalents. Specifically:

  1. DeltaConversionSource.java – Just looking at getCurrentTable method as it uses spark session We need to replace usage of DeltaLog and Snapshot with the corresponding Delta Kernel objects. Moreover we may also need to refactor the code as Snapshot is present in both kernel as well delta code so the references are picked correctly.
  2. DeltaTableExtractor.java – The Snapshot class in Delta Kernel is different in terms of methods from the Spark-based version, which means this file will require significant changes.It need some discovery and a bit more insight on the existing code. e.g snapshot.getMetadata(engine).getPartitionColumns() doesn't exist in delta kernel so we need to figure out such pieces in code.

Let me know if you'd like me to discuss and share your thoughts about the refactoring plan further.

vaibhavk1992 avatar Jun 05 '25 17:06 vaibhavk1992