hdf-compass
hdf-compass copied to clipboard
Error when opening a HDF5 Dataset with np.ndarray
When I try to open an HDF5 Dataset with np.ndarray the application quit unexpectedly with no errors. I have python 2.7 installed.
See DataFile and screenshots.


I tried the same HDF5 dataset with HDF Compass v0.6.0 and it worked. Since you mentioned np.ndarray, I also tried with Python 3.6.9. Below is ipython output:
In [1]: import h5py
In [2]: h5py.version.version
Out[2]: '2.10.0'
In [3]: h5py.version.hdf5_version
Out[3]: '1.10.4'
In [4]: f = h5py.File('Canada_Population.h5', 'r')
In [5]: labels = f['/Record/Labels/Values']
In [6]: labels.shape
Out[6]: (1,)
In [7]: labels.dtype
Out[7]: dtype([('Country', 'O', (1,)), ('Continent', 'O', (1,)), ('Abbreviation', 'O', (1,)), ('Language', 'O', (2,)), ('DataSource', 'O', (1,))])
In [8]: labels[0]
Segmentation fault: 11
The reported stack trace indicates the segmentation fault happened during conversion of the dataset's data to NumPy memory structures by h5py:
0 _conv.cpython-36m-darwin.so 0x000000010efbde9a __pyx_f_4h5py_5_conv_conv_vlen2str + 186
1 _conv.cpython-36m-darwin.so 0x000000010efbdd58 __pyx_f_4h5py_5_conv_generic_converter + 680
2 _conv.cpython-36m-darwin.so 0x000000010efbc76e __pyx_f_4h5py_5_conv_vlen2str + 62
3 libhdf5.103.dylib 0x000000010bfadede H5T_convert + 478
4 libhdf5.103.dylib 0x000000010bfc9f31 H5T__conv_array + 2481
5 libhdf5.103.dylib 0x000000010bfadfc1 H5T_convert + 705
6 libhdf5.103.dylib 0x000000010bfc5314 H5T__conv_struct_opt + 2788
7 libhdf5.103.dylib 0x000000010bfadfc1 H5T_convert + 705
8 libhdf5.103.dylib 0x000000010bfadc28 H5Tconvert + 1272
9 defs.cpython-36m-darwin.so 0x000000010c1f433c __pyx_f_4h5py_4defs_H5Tconvert + 76
10 _proxy.cpython-36m-darwin.so 0x000000010f1aaa18 __pyx_f_4h5py_6_proxy_dset_rw + 2824
11 h5d.cpython-36m-darwin.so 0x000000010f1bbf2e __pyx_pw_4h5py_3h5d_9DatasetID_1read + 542
h5dump description of the the dataset's datatype is:
DATATYPE H5T_COMPOUND {
H5T_ARRAY { [1] H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} } "Country";
H5T_ARRAY { [1] H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} } "Continent";
H5T_ARRAY { [1] H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} } "Abbreviation";
H5T_ARRAY { [2] H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} } "Language";
H5T_ARRAY { [1] H5T_STRING {
STRSIZE H5T_VARIABLE;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_ASCII;
CTYPE H5T_C_S1;
} } "DataSource";
}
which in my opinion is a bit unconventional. I'd suggest simplifying some of the compound fields if interoperability of this file format is important.