incubator-xtable icon indicating copy to clipboard operation
incubator-xtable copied to clipboard

Support for deletion vector translation

Open ashvina opened this issue 1 year ago • 2 comments

Deletion vectors is an optimization feature that can be enabled on Delta Lake tables and Iceberg tables. They allow DELETE and UPDATE operations to mark existing rows as removed or changed without rewriting the Parquet file. Hudi may soon support a similar representation for deletion vectors.

Currently, XTable does not support handling and translating the deletion files between formats. This means that XTable cannot preserve the deletion vectors when converting a table from one format to another, resulting in incomplete translation and/or incorrect results. This feature request is to add support for deletion vector translation in XTable.

The proposed steps to implement the first phase of this feature are:

  • [x] #340
  • [ ] #341
  • [ ] #342
  • [ ] #343
  • [ ] #344
  • [ ] #346
  • [ ] #345
  • [ ] #347
  • [ ] #348

ashvina avatar Feb 27 '24 06:02 ashvina

@ashvina Lack of deletion vector support is a major limitation in XTable as it can't support MOR upsert tables. Adding support for deletion vector / delete files will be extremely useful. Are you working on this currently and are you looking out for some collaboration from community ?

shabeebrp avatar Jun 21 '24 07:06 shabeebrp

Hi @ashvina - I am new to the project, I want start contributing to this feature. At our org, We are trying to implement our own code but want to use Xtable.

Reactor11 avatar Oct 09 '24 07:10 Reactor11