io
io copied to clipboard
LargeList type not supported by tensorflow_io.arrow
It would be good if tensorflow_io.arrow could support a broader variety of data types. It seems that there is currently no support for the LargeList arrow type yet.
Reproducible example: pyarrow version: 2.0.0 tensorflow_io: 0.17.0
import tensorflow_io.arrow as arrow_io
import pyarrow as pa
import tensorflow as tf
arr = pa.array([['a'], ['bb'], ['ccc']], pa.large_list(pa.string()))
table = pa.Table.from_arrays([arr], ['arr'])
print(table.schema)
ads = arrow_io.ArrowDataset.from_record_batches(
table.to_batches(),
output_types=(tf.string,),
output_shapes=(tf.TensorShape([None]),),
batch_size=1,
batch_mode='drop_remainder')
for x in ads:
print(x)
Results in:
arr: large_list<item: string>
child 0, item: string
tensorflow.python.framework.errors_impl.InternalError: Invalid: Invalid argument: arrow data type 0x7f9b429c28e8 is not supported: Type error: Arrow data type is not supported [Op:IteratorGetNext]
Note that it works if I replace pa.large_list above if pa.list_
please add large_list support it is very popular now a days