h5pyd icon indicating copy to clipboard operation
h5pyd copied to clipboard

`hsload` fails on empty data sets (with a dimension of length 0)

Open jonrkarr opened this issue 4 years ago • 12 comments

Below is an error we encountered. The error is that hsload fails on data sets that have a dimension of length 0.

(While I'd expect HSDS to be able to handle this, incidentally this error was actually helpful to us! This alerted us to a case where simulation data unexpectedly wasn't produced due to an error in our code.)

Traceback (most recent call last):
  File "h5py/h5o.pyx", line 302, in h5py.h5o.cb_obj_simple
  File "/home/FCAM/crbmapi/.local/lib/python3.6/site-packages/h5py/_hl/group.py", line 591, in proxy
    return func(name, self[name])
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_apps/utillib.py", line 674, in object_create_helper
    create_dataset(obj, ctx)
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_apps/utillib.py", line 459, in create_dataset
    fillvalue=fillvalue, scaleoffset=scaleoffset)
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_hl/group.py", line 337, in create_dataset
    dsid = dataset.make_new_dset(self, shape=shape, dtype=dtype, **kwds)
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_hl/dataset.py", line 129, in make_new_dset
    raise ValueError(errmsg)
ValueError: Chunk shape must not be greater than data shape in any dimension. (6, 256) is not compatible with (24, 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/hsload", line 33, in 
    sys.exit(load_entry_point('h5pyd==0.8.4', 'console_scripts', 'hsload')())
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_apps/hsload.py", line 314, in main
    load_file(fin, fout, verbose=verbose, dataload=dataload, s3path=s3path, compression=compression, compression_opts=compression_opts)
  File "/usr/local/lib/python3.6/site-packages/h5pyd/_apps/utillib.py", line 714, in load_file
    fin.visititems(object_create_helper)
  File "/home/FCAM/crbmapi/.local/lib/python3.6/site-packages/h5py/_hl/group.py", line 592, in visititems
    return h5o.visit(self.id, proxy)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper

jonrkarr avatar Feb 27 '22 19:02 jonrkarr

Could be linked to https://github.com/HDFGroup/h5pyd/pull/114

loichuder avatar Feb 28 '22 10:02 loichuder

Yes, #114 sounds very similar.

jonrkarr avatar Feb 28 '22 13:02 jonrkarr

This should be fixed in this commit: https://github.com/HDFGroup/h5pyd/commit/5a9193af6ae99a204a7d277d9983431e712f7417.

I'll update the issue when this gets merged with master.

jreadey avatar Nov 24 '22 00:11 jreadey

Fix is in master now.

jreadey avatar Nov 24 '22 01:11 jreadey

Closing - fix is in the 0.12.0 release on PyPI.

jreadey avatar Dec 01 '22 17:12 jreadey

Sorry, I was a bit late to check.

I still get an error when running hsload --link on a file containing an empty dataset (h5py.Empty):

  File ".../h5pyd/_apps/utillib.py", line 737, in create_dataset
    tgt_shape.extend(dobj.shape)
TypeError: 'NoneType' object is not iterable

loichuder avatar Dec 05 '22 08:12 loichuder

Ah, I see - reopening.

jreadey avatar Dec 05 '22 22:12 jreadey

This should fix it: https://github.com/HDFGroup/h5pyd/commit/866c0be4063a1d744df596a8296b95a2b505ee15.

jreadey avatar Dec 05 '22 22:12 jreadey

Nope, still the same error.

Anyway, this is no big deal: hsload now works for scalar datasets and I think h5py.Empty is really uncommon.

loichuder avatar Dec 06 '22 07:12 loichuder

@loichuder - where you testing from master? The commit above was in the aggregate branch. Anyway, I've merged the changes into master and pushed out a new release as 0.12.1.

jreadey avatar Dec 08 '22 05:12 jreadey

Yes tried with the aggregate branch at the time and now with master, still the same issue of https://github.com/HDFGroup/h5pyd/issues/116#issuecomment-1336931275 since dobj.shape is None for h5py.Empty.

No big deal as I said but for the sake of it, here is what I have done to encounter the issue:

  • Creation of the file containing an empty dataset
import h5py

with h5py.File('empty.h5', "w") as h5file:
    h5file.create_dataset("empty", data=h5py.Empty)
  • Loading with hsload --link:
hsload --link [...] files/empty.h5 [...]

loichuder avatar Dec 08 '22 08:12 loichuder

@loichuder - ok I see. This latest checkin should really fix it now! It's on master and in PyPI as version 0.12.2.

jreadey avatar Dec 08 '22 18:12 jreadey

Closing this issue as it should be fixed in 0.12.2 and later.

jreadey avatar Jan 03 '23 17:01 jreadey