pynwb add_unit after write-close returns chunked datasets error

Hi, we are having some difficulties adding units to our nwb file. We are trying to add units in separate parts of our code. It seems that after writing and closing our file, upon reopening and adding again we get an error TypeError: Only chunked datasets can be resized

I attach some simplified code that replicates the problem and is based upon the example code provided with the package:

from datetime import datetime
from dateutil.tz import tzlocal
from pynwb import NWBFile, NWBHDF5IO
import numpy as np
start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal())
create_date = datetime(2017, 4, 15, 12, tzinfo=tzlocal())
nwbfile = NWBFile(session_description='demonstrate NWBFile basics',  # required
                  identifier='NWB123',  # required
                  session_start_time=start_time,  # required
                  file_create_date=create_date)  # optional
nwbfile.add_unit_column('location', 'the anatomical location of this unit')
nwbfile.add_unit_column('quality', 'the quality for the inference of this unit')
nwbfile.add_unit(id=1, spike_times=[2.2, 3.0, 4.5],
                 obs_intervals=[[1, 10]], location='CA1', quality=0.95)
nwbfile.add_unit(id=2, spike_times=[2.2, 3.0, 25.0, 26.0],
                 obs_intervals=[[1, 10], [20, 30]], location='CA3', quality=0.85)
nwbfile.add_unit(id=3, spike_times=[1.2, 2.3, 3.3, 4.5],
                 obs_intervals=[[1, 10], [20, 30]], location='CA1', quality=0.90)
print("writing to file...")
with NWBHDF5IO('example_file_path.nwb', 'w') as io:
    io.write(nwbfile)
print("reading back in file...")
io = NWBHDF5IO('example_file_path.nwb', 'r')
nwbfile_in = io.read()
print("Let's add another unit...")
nwbfile_in.add_unit(id=4, spike_times=[2.2, 3.0, 4.5],
                 obs_intervals=[[1, 10]], location='CA1', quality=0.95)

The full traceback is the following but "hopefully" the above script should replicate the problem.:

Traceback (most recent call last):
  File "<input>", line 64, in <module>
  File "C:\Users\user\OneDrive\Documents\Python Scripts\NWB-conversion\env\lib\site-packages\hdmf\utils.py", line 438, in func_call
    return func(self, **parsed['args'])
  File "C:\Users\user\OneDrive\Documents\Python Scripts\NWB-conversion\env\lib\site-packages\pynwb\file.py", line 516, in add_unit
    call_docval_func(self.units.add_unit, kwargs)
  File "C:\Users\user\OneDrive\Documents\Python Scripts\NWB-conversion\env\lib\site-packages\hdmf\utils.py", line 327, in call_docval_func
    return func(*fargs, **fkwargs)
  File "C:\Users\user\OneDrive\Documents\Python Scripts\NWB-conversion\env\lib\site-packages\hdmf\utils.py", line 438, in func_call
    return func(self, **parsed['args'])
  File "C:\Users\user\OneDrive\Documents\Python Scripts\NWB-conversion\env\lib\site-packages\pynwb\misc.py", line 182, in add_unit
    super(Units, self).add_row(**kwargs)
  File "C:\Users\user\OneDrive\Documents\Python Scripts\NWB-conversion\env\lib\site-packages\hdmf\utils.py", line 438, in func_call
    return func(self, **parsed['args'])
  File "C:\Users\user\OneDrive\Documents\Python Scripts\NWB-conversion\env\lib\site-packages\hdmf\common\table.py", line 335, in add_row
    self.id.append(row_id)
  File "C:\Users\user\OneDrive\Documents\Python Scripts\NWB-conversion\env\lib\site-packages\hdmf\container.py", line 421, in append
    self.__data.resize(shape)
  File "C:\Users\user\OneDrive\Documents\Python Scripts\NWB-conversion\env\lib\site-packages\h5py\_hl\dataset.py", line 425, in resize
    raise TypeError("Only chunked datasets can be resized")
TypeError: Only chunked datasets can be resized

We are using python 3.7. There is no easy way for us right now to add all of our units in one go. If there is a workaround for this issue it would be greatly appreciated Thank you, Pavlos

Mar 05 '20 01:03 pkollias

@pkollias Thanks for the good bug report! As the error suggests, HDF5 Datasets can only be resized (appended after write) if they were stored as "chunked." It is common that users want to do this for rows of a DynamicTable, but this is not the default behavior, and doing this isn't 100% straightforward, especially for indexed columns like spike times (which would actually require both spike_times and spike_times_index to be chunked, since both of these matrices will be expanded). This has been on our to-do list, and it helps to know that this is a feature you would like to see.

I'm curious what your workflow is that has you writing spike times at one point, and then reloading the file and adding new units. Understanding your use-case may help us develop this feature in a way that better suites your needs.

Mar 05 '20 09:03 bendichter

Thanks Ben, that makes sense. To answer your question we have structured our code modules and pipeline so that for large datasets we do the processing by electrodes or electrode groups For now we have an easy work-around so that should not be a problem.

Mar 05 '20 16:03 pkollias