Update npy_append_array.py
- support for both write/append for a smooth switching between the two
- I did this by splitting the self.write() from the __init() code
- if original file did not support appending, we will try in the background to re-save it so it does.
- if original array is not contiguous, we will try in the background to make it contiguous
Thank you for the pull request, always makes me happy to see them!
However, I don't want do add more features at this point since I want NpyAppendArray to become part of Numpy (compare https://github.com/numpy/numpy/pull/20321). Currently, I want NpyAppendArray to develop as close as possible to the Numpy pull request. Also, I see the following aspects:
- write/append: I'd rather add a flag
overwrite_existingto__init__rather than falling back to "r/w/a" schemes ofFile.write, as "a" would work different fromFile.write: NpyAppendArray not just appends, but also updates the header (at least for now, could be different with the Numpy pull request I've postet, see above). - Automatically resaving on append is potentially dangerous in particular for large files: if files are very large they can block the machine/a process for a (relatively) long time. If the machine needs a reboot or the command is run with a timeout (e.g. 1 second and after that, the Python process gets terminated), the file may remain in an unfinished state and can thus be damaged. Such actions should be triggered by an extra function call by the user. If this library had been comfort/feature focussed, I'd rather add a method
make_file_appendableor so. - Making arrays contiguous automatically can also be problematic due to RAM use and should be done with an extra function call by the user (e.g. by creating a contiguous copy of the array). Would be a nice feature for Numpy though to support efficient writing of contiguous
.npyfiles from non-contiguous arrays (if this has not already implemented, I don't know).
I will leave the pull request open and maybe some variant of it might be merged at some point in the future.
all made sense. I removed the write/append entirely. The write method is enough: if the user wants to write, let her use npa.write explicitly. I also removed the contigous check, this can be done by user before appending. As suggested, I added a make_file_appendable method.
btw, I am using it in https://github.com/gityoav/pyg-npy It is insanely fast for writing pd.DataFrame :-)
After a lot of work and discussions here and there, the pull request went through and I've got some feedback from the Numpy team: the wanted to have the AppendArray functionality in an external module. However, now .npy files are appendable by default (given the Numpy version is recent enough). I've got lots of inspiration though and reworked the entire module. It probably now supports all the features you were suggesting in this pull request.
That's good to hear, I now have in production a listener appending to npy files continuosly throughout the day. It's production code so will take some time to migrate but I will have a look. Thanks for keeping me updated and happy new year.
If there is something missing for your usecase, just write here, we can add it then. Happy new year, too!
- support for both write/append for a smooth switching between the two
- I did this by splitting the self.write() from the __init() code
- if original file did not support appending, we will try in the background to re-save it so it does.
- if original array is not contiguous, we will try in the background to make it contiguous
I have implemented the ensure_appendable to make not appendable files appendable. Non contiguous arrays will automatically be made contiguous on write anyway. Should resolve the issue of the pull request.
And sorry for the late answer, took me a while to get it into the state I wanted to have it.