Support validation of dataset references
The validator currently does not validate dataset references. At a minimum it should validate that the reference points to builders with the right data_type.
Example: the following code should produce a validation error because the data_type of the builder within the reference does not match the target_type in the RefSpec for the dataset containing the reference. However, no error is generated because the reference data_type is not checked during validation.
from hdmf.spec import DatasetSpec, SpecCatalog, SpecNamespace, RefSpec
from hdmf.validate import ValidatorMap
from hdmf.build import DatasetBuilder, ReferenceBuilder
foo_spec = DatasetSpec(
doc='a dataset containing a reference',
dtype=RefSpec('NonexistantType', 'object'),
data_type_def='Foo',
shape=None
)
bar_spec = DatasetSpec(
doc='a simple scalar dataset',
data_type_def='Bar',
dtype='int',
shape=None
)
spec_catalog = SpecCatalog()
for spec in [foo_spec, bar_spec]:
spec_catalog.register_spec(spec, 'test.yaml')
namespace = SpecNamespace('a test namespace', 'test_namespace',
[{'source': 'test.yaml'}], version='0.1.0', catalog=spec_catalog)
vmap = ValidatorMap(namespace)
bar = DatasetBuilder('bar', 12, attributes={'data_type': 'Bar'})
foo = DatasetBuilder('foo', ReferenceBuilder(bar), attributes={'data_type': 'Foo'})
result = vmap.validate(bar)
print(result)
Is your feature request related to a problem? Please describe. I want to ensure the validator performs a complete and correct validation of builder objects.
Describe the solution you'd like On encountering a dataset reference type (possibly compound), the validator should verify that the data_type of the referenced builder matches the target_type defined in the spec.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context There could be additional things related to dataset references that are currently not being validated, so it would be good to evaluate if anything else should be included.
Checklist
- [x] Have you ensured the feature or change was not already reported ?
- [x] Have you included a brief and descriptive title?
- [x] Have you included a clear description of the problem you are trying to solve?
- [x] Have you included a minimal code snippet that reproduces the issue you are encountering?
- [x] Have you checked our Contributing document?
Makes sense. One thing to consider is that for large datasets of references this can potentially become expensive as it will require loading the data for that dataset and resolution of the references (not just construction of the builders). As such, it may be useful to make this a configurable option to enable users to optionally disable checking of reference types.
This will be addressed by the next major release.