Binary type not supported by tensorflow_io.arrow
It would be good if tensorflow_io.arrow could support a broader variety of data types. It seems that there is currently no support for the Binary arrow type yet.
Reproducible example: pyarrow version: 2.0.0 tensorflow_io: 0.17.0
import tensorflow_io.arrow as arrow_io
import pyarrow
import tensorflow as tf
arr = pyarrow.array([b'a', b'bb', b'ccc'])
table = pyarrow.Table.from_arrays([arr], ['arr'])
print(table.schema)
ads = arrow_io.ArrowDataset.from_record_batches(
table.to_batches(),
output_types=(tf.string,),
output_shapes=(tf.TensorShape(None),),
batch_size=1,
batch_mode='drop_remainder')
dd = next(iter(ads))
Results in:
arr: binary
tensorflow.python.framework.errors_impl.InternalError: Invalid: Invalid argument: arrow data type 0x7ff7a8457388 is not supported: Type error: Arrow data type is not supported [Op:IteratorGetNext]
This should be pretty straight-forward to add. String types are already supported, and those are just binary arrays in Arrow.
@BryanCutler is there a way to mitigate the error currently?
Is this the place where binary type support should be added ? Can you provide some pointers if possible ?
https://github.com/tensorflow/io/blob/f31422e0eeb08e6336411009d316ff9d0d36edf1/tensorflow_io/core/kernels/arrow/arrow_kernels.cc#L620-L626