Samarth Jain
Samarth Jain
Benchmark results with the patch applied ``` Benchmark Mode Cnt Score Error Units ReadBenchmarks.read1MRowsBS256MPS4MUncompressed thrpt 25 0.947 ± 0.011 ops/s ReadBenchmarks.read1MRowsBS256MPS8MUncompressed thrpt 25 0.952 ± 0.010 ops/s ReadBenchmarks.read1MRowsBS512MPS4MUncompressed thrpt 25...
Benchmark results on master branch: ``` Benchmark Mode Cnt Score Error Units ReadBenchmarks.read1MRowsBS256MPS4MUncompressed thrpt 25 0.952 ± 0.008 ops/s ReadBenchmarks.read1MRowsBS256MPS8MUncompressed thrpt 25 0.947 ± 0.008 ops/s ReadBenchmarks.read1MRowsBS512MPS4MUncompressed thrpt 25 0.957...
Benchmark Name | Master | Airlift Codecs -- | -- | -- ReadBenchmarks.read1MRowsBS256MPS4MUncompressed | 0.952 | 0.947 ReadBenchmarks.read1MRowsBS256MPS8MUncompressed | 0.947 | 0.952 ReadBenchmarks.read1MRowsBS512MPS4MUncompressed | 0.957 | 0.938 ReadBenchmarks.read1MRowsBS512MPS8MUncompressed | 0.956...
> @samarthjain why did you remove Snappy support? @nandorKollar - it looks like Parquet has its own implementation for Snappy which from what I can tell doesn't depend on native....
@nandorKollar - I just pushed a commit to address changes you requested. Sorry for the delay. I had to punt working on this for various reasons.
@nandorKollar - I am not exactly sure where I can add this configuration which I was thinking of naming as `parquet.airlift.compressors.enable` We want both `ParquetReadOptions` (with the config defined in...
Force pushed a new commit that makes it configurable whether to use Airlift based compressors or not. Also added tests and GZIP benchmarks for Airlift compressors. Benchmark results reveal that...
@dbtsai > Since airlift is pure Java implementation, what's the performance implications for zstd? I saw there is a benchmark for GZIP, but I don't see benchmark for other codecs....
@nandorKollar, @rdblue, @danielcweeks - if you have cycles, could you please take a look at this PR.
@JulianJaffePinterest - thanks for the updates. This looks good to me now. Could you rebase the PR?