arkouda icon indicating copy to clipboard operation
arkouda copied to clipboard

Performance comparison between arkouda's parquet implementation and native C++

Open stress-tess opened this issue 2 years ago • 2 comments

This issue will track the requirement to compare the performance of arkouda's parquet code to native C++ this will give us an idea of if any overhead is being added in arkouda's wrapping of these libraries

This will likely be a joint effort between me and @egelberg

@bmcdonald3 and @ronawho, any suggestions of good ways to go about this or any other help would be much appreciate~

stress-tess avatar Oct 11 '23 20:10 stress-tess

This related to https://github.com/Bears-R-Us/arkouda/issues/2736 and an overall push for faster and more consistent read/write times from our parquet code

stress-tess avatar Oct 11 '23 20:10 stress-tess

If you are hoping to run a comparison of Chapel vs C++ for Parquet, you should just be able to pull out the code in the ArrowFunctions.cpp and add a C++ main function that sets up and calls things.

One thing I'd note is that you aren't going to be able to get the C++ code to do distributed reads/writes (unless you put in a lot of effort), so you are going to want to compare this against a CHPL_COMM=none build of Arkouda.

Doing what I've mentioned, I would be quite surprised if there was a difference in calling the C++ code apart from Chapel, so that might not be the most interesting comparison, but could still be worth doing if you are observing some odd things.

The code there is using the undocumented low_level_api for Parquet, since we observed much better performance with it (you can't write batch chunks with the regular API) and you can see some examples of that in the arrow repo here: https://github.com/apache/arrow/blob/ef02417d40af7b970c5eafe809f256380cda2fab/cpp/examples/parquet/low_level_api/reader_writer.cc.

Happy to talk more about this and lend a hand when we get a call setup to talk about it sometime.

bmcdonald3 avatar Oct 11 '23 20:10 bmcdonald3