PARQUET-1886 CompressionCodec Provider-aware Compression Codec Lookup…
… for parquet-mr
Make sure you have checked all steps below.
Jira
- [ PARQUET-1886] CompressionCodec Provider-aware Compression Codec Lookup for parquet-mr
- https://issues.apache.org/jira/browse/PARQUET-1886
Tests
- [ ] My PR adds the following unit tests OR does not need testing for this extremely good reason:
Commits
- [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
- Subject is separated from body by a blank line
- Subject is limited to 50 characters (not including Jira issue reference)
- Subject does not end with a period
- Subject uses the imperative mood ("add", not "adding")
- Body wraps at 72 characters
- Body explains "what" and "why", not "how"
Documentation
- [ ] In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain Javadoc that explain what it does
Has there been a spec discussion about this on the mailing list?
Just to clarify: This enables overloading of the compression implementation but not adding new codecs?
@xhochy It's overloading built-in compression implementation. And it's retrieving data from footer without introducing new spec info. Do you think we need to add it as part of spec?
Connect to https://github.com/apache/arrow/pull/8229
Thanks for working on it @XinDongIntel! Do you want to come to next week's Parquet Sync meeting to discuss? I know it was discussed earlier but now since we started working on new release, I think we can revisit it.