io icon indicating copy to clipboard operation
io copied to clipboard

LargeList type not supported by tensorflow_io.arrow

Open cyc opened this issue 4 years ago • 1 comments

It would be good if tensorflow_io.arrow could support a broader variety of data types. It seems that there is currently no support for the LargeList arrow type yet.

Reproducible example: pyarrow version: 2.0.0 tensorflow_io: 0.17.0

import tensorflow_io.arrow as arrow_io
import pyarrow as pa
import tensorflow as tf

arr = pa.array([['a'], ['bb'], ['ccc']], pa.large_list(pa.string()))
table = pa.Table.from_arrays([arr], ['arr'])
print(table.schema)
ads = arrow_io.ArrowDataset.from_record_batches(
    table.to_batches(),
    output_types=(tf.string,),
    output_shapes=(tf.TensorShape([None]),),
    batch_size=1,
    batch_mode='drop_remainder')
for x in ads:
    print(x)

Results in:

arr: large_list<item: string>
  child 0, item: string

tensorflow.python.framework.errors_impl.InternalError: Invalid: Invalid argument: arrow data type 0x7f9b429c28e8 is not supported: Type error: Arrow data type is not supported [Op:IteratorGetNext]

Note that it works if I replace pa.large_list above if pa.list_

cyc avatar Apr 13 '21 20:04 cyc

please add large_list support it is very popular now a days

schistyakov avatar Jan 27 '23 10:01 schistyakov