iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Support `DataType::Dictionary` for arrow schema conversion

Open jdockerty opened this issue 9 months ago • 0 comments

The current implementation of arrow_schema_to_schema does not support the arrow Dictionary type and cannot be visited.

A small reproducer for this is as follows:

use arrow::datatypes::{DataType, Field, Schema};
use iceberg::arrow::arrow_schema_to_schema;

fn main() {
    let dict_field = Field::new(
        "my_dict",
        DataType::Dictionary(Box::new(DataType::Int32), Box::new(DataType::Utf8)),
        true,
    );
    let schema = Schema::new(vec![dict_field]);
    let _iceberg_schema = arrow_schema_to_schema(&schema).expect("Conversion works");
}

This produces the following error:

thread 'main' panicked at src/main.rs:11:59:
Conversion works: DataInvalid => Cannot visit Arrow data type: Dictionary(Int32, Utf8)

It looks like this Dictionary type does not exist within iceberg-rust as a native type either, nor the Java implementation.

Does this require an expansion to the Java implementation first?

jdockerty avatar Apr 30 '25 13:04 jdockerty