parquet-java
parquet-java copied to clipboard
Apache Parquet
This is me looking at what minimal changes could be made to boost IO performance working with the cloud stores. Compiles against hadoop 3.3.3; will need hadoop 3.3.5 for some...
Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET-2160) issues and references them in the PR title. For example,...
Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET-2169) issues and references them in the PR title. For example, "PARQUET-2169:...
Bumps hadoop-common from 3.2.3 to 3.2.4. [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a...
### Jira This PR addresses the following [PARQUET-2149](https://issues.apache.org/jira/browse/PARQUET-2149): Implement async IO for Parquet file reader ### Tests This PR adds the following unit tests AsyncMultiBufferInputStream.* TestMultipleWriteRead.testReadWriteAsync TestColumnChunkPageWriteStore.testAsync The PR is...
I broke up https://github.com/apache/parquet-mr/pull/953 into more digestible pieces. This new PR is the lowest level set of changes. By themselves, these additions to ByteBufferInputStream don't yield much improvement, so future...
CodecFactory cached instances of compressors and decompressors across threads, which was not thread-safe. This change makes the caches thread-local.
This PR addresses the following JIRA entry: https://issues.apache.org/jira/browse/PARQUET-2069 ParquetMR breaks compatibility with itself by including a JSON representation of a schema that names a record "list", when it should be...
… logical Timestamps, Date, TimeOfDay Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the...
Remove the deprecated classes PathGlobPattern and DeprecatedFieldProjectionFilter so that Parquet will compile against hadoop 3.x. If a thrift reader is configured to use the now-deleted filter, by setting the filter...