Flink: Adding Variant Support to Flink 2.1
Added full variant read/write support for Flink, including type conversion, schema integration, and binary serialization. The implementation uses Flink's BinaryVariant class with getValue() and getMetadata() methods.
Supported Types:
- All primitive types: BOOLEAN, INT8/16/32/64, FLOAT, DOUBLE, STRING, BINARY, UUID
- Date/time types: DATE, TIME, TIMESTAMPTZ, TIMESTAMPNTZ
- Decimal types: DECIMAL4/8/16 with proper precision handling
- Complex types: Arrays, objects, nested structures
Future Work
- Filter pushdown on variant field contents
- Variant field metrics and statistics
- Variant field aggregations
Thanks @talatuyarer for the PR. I will need some time to understand the Variants in depth in both Iceberg and in Flink. Maybe @mxm, @stevenzwu or @Guosmilesmile can help too.
Thank you @pvary
This was useful for me: https://cwiki.apache.org/confluence/display/FLINK/FLIP-521%3A+Integrating+Variant+Type+into+Flink%3A+Enabling+Efficient+Semi-Structured+Data+Processing
@aihuaxu When you have chance can you also review conceptually. Am I covering all cases of Iceberg's Variant ?
@aihuaxu When you have chance can you also review conceptually. Am I covering all cases of Iceberg's Variant ?
Sorry that I missed this message. I will take a look. Thanks for working on the support.
@talatuyarer Overall the logic makes sense to me in which we are relying on the existing Parquet variant reader/writer to handle most of the work including shredding and what we need is mostly to convert between Flink Variant and Iceberg Variant.
As I read, shredding should work automatically. Can we add some basic validation for shredding coverage to make sure the Parquet schema is as expected?
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.