Chao Sun

Results 91 comments of Chao Sun

Sorry for the late reply. Have you resolved the issue? If not, can you share the code which does the writing? You should write multiple rows in each row group.

Yes, it seems you are calling `close_column` and `close_row_group` for every row, which is not optimal. The latter will write the Parquet row group metadata to the file. Instead, you...

Thanks @alecmocatta ! The performance improvement looks very impressive 👍 ! Looking forward to a PR on this 😄 .

Thanks. Yes, filing a JIRA against arrow is the right thing to do. Looking forward to it!

Thanks @alecmocatta ! Could you open a pull request in arrow? it's a pretty big change and I'll take some time to look at it.

> We have similar thing in record reader. Hmm... you mean `record/reader.rs`? I couldn't find anything related. This is on the encoding level though - so we'll need to add...

The interface will be similar to [here](https://github.com/apache/arrow/blob/master/cpp/src/parquet/encoding.h#L51). The `valid_bits` will be computed from def/rep levels, and passed to the call. See [here](https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/record_reader.cc#L419) for an example.

Thanks @liurenjie1024 . Updated the description for some potential tasks.

@sadikovi Thanks - added. @andygrove cool - will take a look.