Ivan

Results 16 issues of Ivan

This is derived from #116. Currently we do not limit row groups size in any way, even though we have options in `WriterProperties` for it, like max row group size....

enhancement

This is derived from #116. It would good to add statistics support for a write path, since currently we do not write statistics. I think it only needs to be...

enhancement

This is a follow-up of #156. CLI tools `parquet-schema` and `parquet-read` could be improved with a better help message and parameter support, since the current version of both tools has...

enhancement

### What changes were proposed in this pull request? This PR updates schema inference in DSv1 FileFormat to remove overlapping columns from the data schema and keep them in the...

SQL

Currently we build statistics without accounting for dictionary pages. We should either have dictionary page statistics without column filters, or, if there is a fallback, have a split of statistics....

Currently we cache filter statistics and table metadata for each queried table. This issue is about caching query plan, so when we hit the same plan, we can yield result...

question

Currently when building dictionary filters we have to keep it memory. This should spill to disk after certain threshold.

Currently we are using Spark Parquet reader, this issue is about investigating if we can extract data pages and index those including each page statistics. During scan we would select...

question

Add performance tests to compare with Parquet implementation or compare performance against releases. This should be run as part of CI to determine if there is a regression in performance.

enhancement