iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Flink: Adding Variant Support to Flink 2.1

Open talatuyarer opened this issue 3 months ago • 6 comments

Added full variant read/write support for Flink, including type conversion, schema integration, and binary serialization. The implementation uses Flink's BinaryVariant class with getValue() and getMetadata() methods.

Supported Types:

  • All primitive types: BOOLEAN, INT8/16/32/64, FLOAT, DOUBLE, STRING, BINARY, UUID
  • Date/time types: DATE, TIME, TIMESTAMPTZ, TIMESTAMPNTZ
  • Decimal types: DECIMAL4/8/16 with proper precision handling
  • Complex types: Arrays, objects, nested structures

Future Work

  • Filter pushdown on variant field contents
  • Variant field metrics and statistics
  • Variant field aggregations

talatuyarer avatar Oct 05 '25 12:10 talatuyarer

Thanks @talatuyarer for the PR. I will need some time to understand the Variants in depth in both Iceberg and in Flink. Maybe @mxm, @stevenzwu or @Guosmilesmile can help too.

pvary avatar Oct 06 '25 05:10 pvary

Thank you @pvary

This was useful for me: https://cwiki.apache.org/confluence/display/FLINK/FLIP-521%3A+Integrating+Variant+Type+into+Flink%3A+Enabling+Efficient+Semi-Structured+Data+Processing

talatuyarer avatar Oct 06 '25 05:10 talatuyarer

@aihuaxu When you have chance can you also review conceptually. Am I covering all cases of Iceberg's Variant ?

talatuyarer avatar Oct 06 '25 06:10 talatuyarer

@aihuaxu When you have chance can you also review conceptually. Am I covering all cases of Iceberg's Variant ?

Sorry that I missed this message. I will take a look. Thanks for working on the support.

aihuaxu avatar Nov 05 '25 17:11 aihuaxu

@talatuyarer Overall the logic makes sense to me in which we are relying on the existing Parquet variant reader/writer to handle most of the work including shredding and what we need is mostly to convert between Flink Variant and Iceberg Variant.

As I read, shredding should work automatically. Can we add some basic validation for shredding coverage to make sure the Parquet schema is as expected?

aihuaxu avatar Nov 05 '25 17:11 aihuaxu

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Dec 11 '25 00:12 github-actions[bot]