npy-append-array icon indicating copy to clipboard operation
npy-append-array copied to clipboard

[enhancement] Can this lib support npy append other data type? like string

Open zhuwenxing opened this issue 2 years ago • 1 comments

It works well to append an item if this item is an array. But it the item is an object like a string, then there will be error

[2023-07-27T07:48:55.983Z] 
[2023-07-27T07:48:55.983Z] array = 'FchBBdlm2uMLLr1mVjpwtFsnnSmkEvLbWo1DXvOAHbx9JAvWu9vH45eTBRCUqAZjU0Bmz34loB84MYKjekcWTFZaGh1fbAU61G9Tu6TJbIVLkb7t0jIzs...hRIJ8LTLVOCbYqEprSz55WXhvJ8NX1LoNVFJ3kA3YUp8NNNPQjJeZLQdhrb4GYvrxOVjt1Tx7QGgPO00kYa8cOjagDwlNSKLw3Dr6XRzeu3HXyDFpTanZQ'
[2023-07-27T07:48:55.983Z] 
[2023-07-27T07:48:55.983Z]     def header_data_from_array_1_0(array):
[2023-07-27T07:48:55.983Z]         """ Get the dictionary of header metadata from a numpy.ndarray.
[2023-07-27T07:48:55.983Z]     
[2023-07-27T07:48:55.983Z]         Parameters
[2023-07-27T07:48:55.983Z]         ----------
[2023-07-27T07:48:55.983Z]         array : numpy.ndarray
[2023-07-27T07:48:55.983Z]     
[2023-07-27T07:48:55.983Z]         Returns
[2023-07-27T07:48:55.983Z]         -------
[2023-07-27T07:48:55.983Z]         d : dict
[2023-07-27T07:48:55.983Z]             This has the appropriate entries for writing its string representation
[2023-07-27T07:48:55.983Z]             to the header of the file.
[2023-07-27T07:48:55.983Z]         """
[2023-07-27T07:48:55.983Z] >       d = {'shape': array.shape}
[2023-07-27T07:48:55.983Z] E       AttributeError: 'str' object has no attribute 'shape'
[2023-07-27T07:48:55.983Z] 
[2023-07-27T07:48:55.983Z] /usr/local/lib/python3.8/dist-packages/numpy/lib/format.py:357: AttributeError

zhuwenxing avatar Jul 27 '23 08:07 zhuwenxing

Sorry for the late reply!

As soon as you have arrays of type "object", Numpy defaults back to the general Python pickling mechanism. Numpy Append Array relies on native Numpy arrays and fixed-size dtypes (objects are not fixed-sized). Otherwise, appending would not be efficient anymore and things would need to be done in memory anyway. Also note that "object" arrays cannot be memory mapped for the same reason.

The problem is that string arrays are not so well supported in Numpy yet. There are some new developments though:

https://numpy.org/neps/nep-0055-string_dtype.html

What could be a solution is to use two arrays: one for the actual text, stored as bytes and one for offsets (e.g. int64) and (redundant) lengths into that array. This is a pretty common approach can be handled with NpyAppendArray.

Does this answer help you?

xor2k avatar Nov 20 '23 12:11 xor2k