hive icon indicating copy to clipboard operation
hive copied to clipboard

HIVE-29287: Iceberg: [V3] Variant Shredding support

Open deniskuzZ opened this issue 3 months ago • 7 comments

What changes were proposed in this pull request?

Support for variant shredding, enabling Hive to write shredded variant data into Iceberg tables.

Ideally, this should follow the approach described in the reader/writer API proposal for Iceberg V4, where an execution engine provides the shredded writer schema.

As an interim solution, this PR introduces a writer that infers the shredded schema from the sample record captured before the Parquet writer is initialized.

Why are the changes needed?

Enables data skipping (predicate pushdown)

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • TestHiveIcebergSelects#testVariantSelectProjection
  • variant_type_shredding.q

deniskuzZ avatar Oct 23 '25 14:10 deniskuzZ

same thing as https://github.com/apache/iceberg/pull/14297

deniskuzZ avatar Oct 31 '25 13:10 deniskuzZ

I tested variant_type_shredding.q by removing 'variant.shredding.enabled'='true' from the table properties, and the qtest still passes without any failures. This test verifies that basic INSERT/SELECT operations succeed with VARIANT columns but not actual shredding in the file. This testing is not possible in qtest.

so maybe we can add a JUnit test (e.g., TestVariantShredding) that: Writes VARIANT data with variant.shredding.enabled=true and false Opens the resulting Parquet files via ParquetFileReader Asserts that the typed_value field is present/absent accordingly

kokila-19 avatar Nov 18 '25 11:11 kokila-19

I tested variant_type_shredding.q by removing 'variant.shredding.enabled'='true' from the table properties, and the qtest still passes without any failures. This test verifies that basic INSERT/SELECT operations succeed with VARIANT columns but not actual shredding in the file. This testing is not possible in qtest.

so maybe we can add a JUnit test (e.g., TestVariantShredding) that: Writes VARIANT data with variant.shredding.enabled=true and false Opens the resulting Parquet files via ParquetFileReader Asserts that the typed_value field is present/absent accordingly

that test was added in iceberg: Expose variantShreddingFunc() in Parquet.DataWriteBuilder added here as well: TestHiveIcebergSelects#testVariantSelectProjection

plan was to fully cover the functionality with explain plan once PPD support is added.

deniskuzZ avatar Nov 26 '25 09:11 deniskuzZ

minor comment. LGTM +1

kokila-19 avatar Dec 09 '25 12:12 kokila-19

As all of the comment is address, can we merge?

need to get a green build, it's flaky atm

deniskuzZ avatar Dec 12 '25 05:12 deniskuzZ