Ying
Ying
Running PrestoServer or HiveQueryRunner on macOS with M1 chip gives the following errors: ``` -12T20:14:49.742-0500 ERROR main Bootstrap Uncaught exception in thread main java.lang.RuntimeException: failed to load Hadoop native library...
This PR implements fast bit unpacking for continuous bits as described in [Design doc](https://github.com/facebookincubator/velox/discussions/2353) The following was the benchmark result of BitUnpackingBenchmark. Time unit is us. The "IntDecoder" implementation was...
Parquet spec requires each ColumnChunk's file_offset to be set correctly. This is the Parquet cpp generated from thrift definition: ``` uint32_t ColumnChunk::read(::apache::thrift::protocol::TProtocol* iprot) { ::apache::thrift::protocol::TInputRecursionTracker tracker(*iprot); ... bool isset_file_offset =...
`SelectiveStructColumnReader::filterRowGroups()` go over the stats column by column, and merge the RowGroup Ids to be skipped from each column to the final `strideToSkip_`. This could produce many duplicates in the...
On latest main, shortDecimalDirect in E2EFilterTest fails with this: ``` /Users/yingsu/repo/velox2/velox/cmake-build-debug/velox/dwio/parquet/tests/reader/velox_parquet_e2e_filter_test --gtest_filter=E2EFilterTest.shortDecimalDirect:E2EFilterTest/*.shortDecimalDirect:*/E2EFilterTest.shortDecimalDirect/*:*/E2EFilterTest/*.shortDecimalDirect --gtest_color=no Testing started at 8:53 PM ... /Users/yingsu/repo/velox2/velox/velox/dwio/common/tests/E2EFilterTestBase.cpp:204: Failure Value of: batch->equalValueAt(batches[batchIndex].get(), i, rowIndex) Actual: false Expected: true...
The following code throws exception when batch size is large. e.g. batches[0].size() == 1000000; ``` auto filters = filterGenerator->makeSubfieldFilters(filterSpecs, batches, hitRows); ``` The call stack is ``` long std::__1::__libcpp_atomic_refcount_increment(long&) memory:3102...
The following code would fail ``` void writeToFile() { auto sink = std::make_unique("testfile"); auto writer = std::make_unique( std::move(sink), *pool_, 10000, ::parquet::WriterProperties::Builder().build()); for (auto& batch : batches) { writer->write(batch); } writer->close();...