datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Incorrect statistics read for struct array in parquet

Open NGA-TRAN opened this issue 1 year ago • 1 comments

Describe the bug

I found this while adding tests https://github.com/apache/datafusion/pull/10608. The statistics of struct array returns nothing

To Reproduce

See test_struct in https://github.com/apache/datafusion/pull/10608

Expected behavior

Return some values for the statistics

Additional context

No response

NGA-TRAN avatar May 21 '24 21:05 NGA-TRAN

take

Lordworms avatar May 22 '24 23:05 Lordworms

#8334 Related. The current statistics for structs returns null.

xinlifoobar avatar May 29 '24 06:05 xinlifoobar

The problem here is how to effectively deal with nested struct, I don't actually know whether all the columns related to one struct are totally stored in one row group or they would separate in different row groups

Lordworms avatar May 30 '24 16:05 Lordworms