reuster986
reuster986
@ronawho Thanks for looking into this. In our applications, we care much more about reading than writing, unfortunately. Most of our datasets come in as hundreds or even thousands of...
@ronawho excellent sleuthing, and thank you for the update! I have run into a lot of gotchas with HDF5, but I never considered that the setup/metadata operations might be bottlenecking.
Here's another discussion of HDF5 performance issues that focuses on file open/close: https://www.hdfgroup.org/wp-content/uploads/2018/04/avoid_truncate_white_paper_180219.pdf Section 2 ("Slow File Open") looks promising, and they have a fix for it, but frustratingly they...
Thanks @ronawho . What raises the importance of this edge case is that `-(2**63)` is the value of `pandas.NaT` (Not a Time), which is like `nan` for Datetime and Timedelta...
Thanks for the input, guys. I'm waiting to make a decision on this until we resolve https://github.com/Bears-R-Us/arkouda/issues/1013#issuecomment-1017591744 , since that will determine what the time-related classes look like.
I think I agree with @21771 that we should deviate from numpy in this specific case, in order to preserve the precision of uint64 values like hashes.
I propose we restructre the test directories as ``` test/ server/ modules/ diagnostic/ ``` The `server/` and `modules/` dirs would be meant for CI-compatible tests that return a success/fail code...
I forgot to explicitly state that the `test/server` dir would be for python tests that exercise the server, while the `test/modules` dir would be for Chapel unit tests that do...
@hokiegeek2 I think ideally we should support overwriting an individual dataset, but if it is too hard to implement, we don't need to make it a high priority.
This reminds me, I should make a benchmark for concatenate, which would definitely probe the effect of this option.