parquet-testing icon indicating copy to clipboard operation
parquet-testing copied to clipboard

FIXED_LEN_BYTE_ARRAY + DELTA_BYTE_ARRAY for Decimals

Open 4ertus2 opened this issue 7 months ago • 5 comments

Arrow C++ reader had troubles with FLBA + DELTA_BYTE_ARRAY. There're issues in Decimal readings with such encodings in ClickHouse, DuckDB, Velox right now. It would be great if you make a sample of decimal columns with different precisions for the encoding combination.

4ertus2 avatar Jun 26 '25 11:06 4ertus2

Could you please elaborate the trouble in FLBA + DELTA_BYTE_ARRAY? How can we reproduce it?

wgtmac avatar Jun 27 '25 06:06 wgtmac

Just make a parquet file with Decimal(19-38, x) with FLBA+DeltaByteArray encoding. And try to read it with non-arrow C++ reader (ClickHouse, DuckDB, Velox).

4ertus2 avatar Jun 28 '25 21:06 4ertus2

Thanks for the information! Have you tried to use https://github.com/apache/parquet-java/tree/master/parquet-cli to read it? Unfortunately, I don't have a configured setup for these non-arrow C++ reader (ClickHouse, DuckDB, Velox). It would be helpful to provide the error message or even stacktrace here to help understanding the issue.

wgtmac avatar Jun 30 '25 01:06 wgtmac

CH https://github.com/ClickHouse/ClickHouse/issues/62141#issuecomment-2457905015

DuckDB:

Invalid Error:
Delta Byte Array encoding is only supported for string/blob data

Velox:

Encoding not supported yet: DELTA_BYTE_ARRAY

4ertus2 avatar Jun 30 '25 08:06 4ertus2

Thanks. I thought it was a bug produced by the Parquet writer and now it is clear that it is a missing feature from those engines.

wgtmac avatar Jul 04 '25 04:07 wgtmac