Scope out opportunities to simplify aind-data-schema
Some notes from poking around. This will be an ongoing comment:
General:
- We should limit the number of classes in each file to say 10 maximum and split files that have too many. Procedures in particular should get split into procedures by modality
- There is a missing concept of an atlas and atlas transform, related to injection targets and CCF information
acquisition.py
- ProcessingSteps probably needs to be generalized for outside users, potentially moved out of this file
data_description.py
- platform is currently internal
- organizations list is pretty limited for outside users
instrument.py
- Calibrations could include timestamps, rather than separating them out
procedures.py
- Remove create_unit_with_value
- Lots of value/unit pairs that could be combined
- Headframe + well, not all headframes have wells so that could be split off
- ProtectivePartReplacement duplicates well information?
- Tars virus should probably go into models as a registry
- The main procedures class should really be split into two for SubjectProcedures and SpecimenProcedures
- Injections that target atlas coordinates need an "Insertion" class to handle position/rotation information
processing.py
- Hanging TODO
rig.py
- Should get split into modality-specific rig types so that there are fewer blank list types
- Many fields are required but really optional when defined with empty [] lists
- Custom validators for modalities should get split into their respective subclasses
- CCF coordinate transform, origin, and rig axes should get merged into a single "atlas" and "transform" class similar to how Pinpoint does this
session.py
- This class is super complex imo, would like to discuss how to simplify and separate functional parts into subclasses
subject.py
- Would be good to track the use of "Other" in enrichment and add (and then go back and replace) missing enrichment types
- A lot of fields are optional like Housing, that probably shouldn't be
- No information here about weighings or growth information, is any of that tracked or only during procedures?
~~Blocking this until next meeting about it~~
1st discussion about specific changes (9/27)
Simplifying procedures
- Put them into a components file
- Subject/Specimen procedures in different component files
Coordinates [breaking change]
- There should be a SurgeryCoordinates class? List[that]
- Attach angles
- Atlas + transforms
Extensions
- Make an extensions/aind/??
Instrument
- Calibrations are deprecated, remove? Or replace with devices.Calibration. Talk to John + Adam
Value with Unit [breaking change]
- Replaced them all with value and value_unit
- Remove all XValue and take them out of models repo
Rig [breaking change]
- Annotated Union instead of list of Optional
Idea: "people" should be abstracted out so that we're re-using the same class and set of information across all mentions of people? (e.g. evaluator in qc, and experimenter in other places)
Both anonymized/non-anonymized experimenter
- Use annotated unions in rig and session to remove empty lists that confuse users.
- create AIND specific validators in an AIND extension so WE can require fields that we don't expect external scientists to have.
- remove the stimulus classes. These can maybe be in an extension. The goal really is for people to create their own dictionary that is consistent to a project/platform/whatever the right level is, but not for us to dictate or maintain those. (We might need to develop some tools to help people do this though)
Simplifications are going to be moved to a spec doc https://alleninstitute.sharepoint.com/:w:/r/sites/NeuralDynamics/_layouts/15/Doc.aspx?sourcedoc=%7BE27B889D-27A1-4CC0-948C-10ADBE605654%7D&file=Data+Schema+2.0.docx&action=default&mobileredirect=true where we can continue the discussion, before breaking these out into tickets