reuster986 comments

Results 28 comments of


                                            reuster986

Reading hdf5 files with ak.load does not scale

@ronawho Thanks for looking into this. In our applications, we care much more about reading than writing, unfortunately. Most of our datasets come in as hundreds or even thousands of...

Reading hdf5 files with ak.load does not scale

@ronawho excellent sleuthing, and thank you for the update! I have run into a lot of gotchas with HDF5, but I never considered that the setup/metadata operations might be bottlenecking.

Reading hdf5 files with ak.load does not scale

Here's another discussion of HDF5 performance issues that focuses on file open/close: https://www.hdfgroup.org/wp-content/uploads/2018/04/avoid_truncate_white_paper_180219.pdf Section 2 ("Slow File Open") looks promising, and they have a fix for it, but frustratingly they...

`ak.abs(x)` fails on value -(2**63)

Thanks @ronawho . What raises the importance of this edge case is that `-(2**63)` is the value of `pandas.NaT` (Not a Time), which is like `nan` for Datetime and Timedelta...

`ak.abs(x)` fails on value -(2**63)

Thanks for the input, guys. I'm waiting to make a decision on this until we resolve https://github.com/Bears-R-Us/arkouda/issues/1013#issuecomment-1017591744 , since that will determine what the time-related classes look like.

Python positive Integer list becomes Float if values both GTE and LT 2^63

I think I agree with @21771 that we should deviate from numpy in this specific case, in order to preserve the precision of uint64 values like hashes.

restructure testing/checking directories

I propose we restructre the test directories as ``` test/ server/ modules/ diagnostic/ ``` The `server/` and `modules/` dirs would be meant for CI-compatible tests that return a success/fail code...

restructure testing/checking directories

I forgot to explicitly state that the `test/server` dir would be for python tests that exercise the server, while the `test/modules` dir would be for Chapel unit tests that do...

Arkouda does not support overwriting datasets in hdf5 -> should it?

@hokiegeek2 I think ideally we should support overwriting an individual dataset, but if it is too hard to implement, we don't need to make it a high priority.

Should we use chpl_serializeSlices?

This reminds me, I should make a benchmark for concatenate, which would definitely probe the effect of this option.