hdmf icon indicating copy to clipboard operation
hdmf copied to clipboard

Overridden dataset dtype and shape are not passed from parent to child

Open rly opened this issue 5 years ago • 3 comments

If a schema has two groups, e.g., PatchClampSeries and CurrentClampSeries defined as such:

groups:
- neurodata_type_def: PatchClampSeries
  neurodata_type_inc: TimeSeries
  doc: ...
  datasets:
  - name: data
    dtype: numeric
    dims:
    - num_times
    shape:
    - null
    doc: ...
- neurodata_type_def: CurrentClampSeries
  neurodata_type_inc: PatchClampSeries
  doc: ...
  datasets:
  - name: data
    doc: Recorded voltage.

where the 'data' dataset in CurrentClampSeries overrides the 'data' dataset in the parent PatchClampSeries, then the dtype, dims, and shape keys of the parent PatchClampSeries 'data' dataset are not inherited by the child CurrentClampSeries 'data' dataset.

Is this intended behavior? @oruebel @ajtritt

If so, the NWB schema needs to be amended in a few places.

rly avatar Mar 24 '20 20:03 rly

I encountered this while doing some validator testing. This also applies when the datasets are defined with specific inheritance, eg:

datasets:
- data_type_def: D1
  dtype: numeric
  dims:
  - dim1
  - dim2
  shape:
  - null
  - null
  doc: a base dataset
  attributes:
  - name: foo
    dtype: text
    doc: an attribute
- data_type_def: D2
  data_type_inc: D1
  doc: an inherited dataset

Only the attribute foo will be validated on a builder for type D2. dtype, dims, and shape are not passed down.

It looks like the default inheritance passes down specification elements defined on the dictionary accessible via BaseStorageSpec.attributes. However, DatasetSpec sets shape, dims, and dtype in the base dict that ConstructableDict inherits from (self['shape'] self['dims'], and self['dtype'] when accessed from within DatasetSpec), rather than on the .attributes field via BaseStorageSpec.set_attribute('shape', ...), etc.

It looks like the current implementation of #321 treats this by passing down the shape, etc of child attributes when a parent is inherited (e.g. PatchClampSeries.data.shape is passed to CurentClampSeries.data.shape). However, I'm not sure that would work for the above situation, even if we defined a group holding D2 as below:

groups:
- data_type_def: G1
  doc: a group container
  datasets:
  - name: my_d2
    data_type_def: D2
    doc: a D2 dataset on group G1

It looks like #321 might be a bit stale, so maybe a lot has changed between then and hdmf 2.5.1. And I'm not entirely sure I'm clear on how everything is supposed to happen with spec inheritance, so maybe there's something I did wrong with my tests. But it'd be good to validate that #321 does resolve the above situation.

dsleiter avatar May 10 '21 10:05 dsleiter

To be a bit more clear, here's a test case I was working from:

  1. create a spec for D1 and D2 and save to file.
import os
from hdmf.spec import NamespaceBuilder, export_spec, AttributeSpec, DatasetSpec, GroupSpec


namespace_name = 'foo'
ns_builder = NamespaceBuilder(doc='Test namespace', name=namespace_name, version='0.1.0')

d1_spec = DatasetSpec(doc='d1', data_type_def='D1', shape=[None, None], dims=['dim1', 'dim2'], dtype='numeric',
                      attributes=[AttributeSpec(name='foo', doc='an attribute', dtype='text')])
d2_spec = DatasetSpec(doc='d2', data_type_def='D2', data_type_inc='D1')
g1_spec = GroupSpec(doc='g1', data_type_def='G1', datasets=[d2_spec])

output_dir = 'test-spec'
if not os.path.exists(output_dir):
    os.mkdir(output_dir)
    
export_spec(ns_builder, [d1_spec, d2_spec, g1_spec], str(output_dir))
  1. load and compare the specs for D1, D2, and G1:
from hdmf import common

common.load_namespaces(os.path.join(output_dir, f'{namespace_name}.namespace.yaml'))
ns_catalog = common.get_manager().namespace_catalog
ns = ns_catalog.get_namespace(name=namespace_name)

print(ns.get_spec('D1'))
print(ns.get_spec('D2'))
print(ns.get_spec('G1'))

Output:

{'shape': [None, None],
 'dims': ['dim1', 'dim2'],
 'dtype': 'numeric',
 'doc': 'd1',
 'data_type_def': 'D1',
 'attributes': [{'doc': 'an attribute', 'name': 'foo', 'dtype': 'text'}]}

{'doc': 'd2',
 'data_type_inc': 'D1',
 'data_type_def': 'D2',
 'attributes': [{'doc': 'an attribute', 'name': 'foo', 'dtype': 'text'}]}

{'datasets': [{'doc': 'd2',
   'data_type_inc': 'D1',
   'data_type_def': 'D2',
   'attributes': [{'doc': 'an attribute', 'name': 'foo', 'dtype': 'text'}]}],
 'doc': 'g1',
 'data_type_def': 'G1'}

I would expect D2 to have shape, dims, and dtype available. I also am not sure #321 would fix the situation of G1.

dsleiter avatar May 10 '21 10:05 dsleiter

This came up again in https://github.com/NeurodataWithoutBorders/nwb-schema/issues/622

The resolve_spec behavior in HDMF needs to be adjusted so that dtype, shape, requiredness, and other properties are inherited by the child spec if not explicitly changed (and if they are explicitly changed, we should ensure they are compatible, but perhaps that is a separate issue).

rly avatar May 16 '25 00:05 rly