aind-data-schema icon indicating copy to clipboard operation
aind-data-schema copied to clipboard

Scope out opportunities to simplify aind-data-schema

Open dyf opened this issue 1 year ago • 5 comments

dyf avatar Sep 09 '24 17:09 dyf

Some notes from poking around. This will be an ongoing comment:

General:

  • We should limit the number of classes in each file to say 10 maximum and split files that have too many. Procedures in particular should get split into procedures by modality
  • There is a missing concept of an atlas and atlas transform, related to injection targets and CCF information

acquisition.py

  • ProcessingSteps probably needs to be generalized for outside users, potentially moved out of this file

data_description.py

  • platform is currently internal
  • organizations list is pretty limited for outside users

instrument.py

  • Calibrations could include timestamps, rather than separating them out

procedures.py

  • Remove create_unit_with_value
  • Lots of value/unit pairs that could be combined
  • Headframe + well, not all headframes have wells so that could be split off
  • ProtectivePartReplacement duplicates well information?
  • Tars virus should probably go into models as a registry
  • The main procedures class should really be split into two for SubjectProcedures and SpecimenProcedures
  • Injections that target atlas coordinates need an "Insertion" class to handle position/rotation information

processing.py

  • Hanging TODO

rig.py

  • Should get split into modality-specific rig types so that there are fewer blank list types
  • Many fields are required but really optional when defined with empty [] lists
  • Custom validators for modalities should get split into their respective subclasses
  • CCF coordinate transform, origin, and rig axes should get merged into a single "atlas" and "transform" class similar to how Pinpoint does this

session.py

  • This class is super complex imo, would like to discuss how to simplify and separate functional parts into subclasses

subject.py

  • Would be good to track the use of "Other" in enrichment and add (and then go back and replace) missing enrichment types
  • A lot of fields are optional like Housing, that probably shouldn't be
  • No information here about weighings or growth information, is any of that tracked or only during procedures?

dbirman avatar Sep 17 '24 20:09 dbirman

~~Blocking this until next meeting about it~~

dbirman avatar Sep 19 '24 01:09 dbirman

1st discussion about specific changes (9/27)

Simplifying procedures

  • Put them into a components file
  • Subject/Specimen procedures in different component files

Coordinates [breaking change]

  • There should be a SurgeryCoordinates class? List[that]
  • Attach angles
  • Atlas + transforms

Extensions

  • Make an extensions/aind/??

Instrument

  • Calibrations are deprecated, remove? Or replace with devices.Calibration. Talk to John + Adam

Value with Unit [breaking change]

  • Replaced them all with value and value_unit
  • Remove all XValue and take them out of models repo

Rig [breaking change]

  • Annotated Union instead of list of Optional

dbirman avatar Sep 27 '24 21:09 dbirman

Idea: "people" should be abstracted out so that we're re-using the same class and set of information across all mentions of people? (e.g. evaluator in qc, and experimenter in other places)

Both anonymized/non-anonymized experimenter

dbirman avatar Oct 02 '24 16:10 dbirman

  • Use annotated unions in rig and session to remove empty lists that confuse users.
  • create AIND specific validators in an AIND extension so WE can require fields that we don't expect external scientists to have.
  • remove the stimulus classes. These can maybe be in an extension. The goal really is for people to create their own dictionary that is consistent to a project/platform/whatever the right level is, but not for us to dictate or maintain those. (We might need to develop some tools to help people do this though)

saskiad avatar Oct 03 '24 04:10 saskiad

Simplifications are going to be moved to a spec doc https://alleninstitute.sharepoint.com/:w:/r/sites/NeuralDynamics/_layouts/15/Doc.aspx?sourcedoc=%7BE27B889D-27A1-4CC0-948C-10ADBE605654%7D&file=Data+Schema+2.0.docx&action=default&mobileredirect=true where we can continue the discussion, before breaking these out into tickets

dbirman avatar Oct 25 '24 04:10 dbirman