refactor data access part 1 models validators [Please donot merge]
Context: We are breaking the PR https://github.com/MIT-LCP/physionet-build/pull/1967 into smaller PR(easy to review and work on). This branch is expected to merge on the 1967, not dev.
This PR introduces the DataAccess model and validators
Quick Summary about the model, DataSource model should be used to decide where the files are
stored(determined by data_location) for project and how they can be accessed(determined by access_mechanism).
A single project can have multiple DataSource.
About the fields
files_available - determines if the files can be viewed/downloaded for the given type of datasource.(@kshalot had notes about this field here https://github.com/MIT-LCP/physionet-build/pull/1967#discussion_r1170257746)
email - For GCP group access, this would store the email of the group.
uri - The URI for the data on the external service. For s3 this would be of the form s3://<bucket_name>, for gsutil this would be of the form gs://<bucket_name>
Quick Summary about validators
The validation is based on four aspects: required fields, forbidden fields, required access mechanisms, and forbidden access mechanisms.
-
Required Fields: For each data location (such as Google BigQuery, Google Cloud Storage, AWS Open Data, and AWS S3), certain fields must be present. For instance, Google BigQuery requires an 'email', while Google Cloud Storage, AWS Open Data, and AWS S3 require a 'uri'. If a required field is missing, a validation error is raised.
-
Forbidden Fields: Conversely, for certain data locations, some fields must not be present. For example, for 'Direct' data location, 'uri' and 'email' fields should not be present. If they are found, a validation error is raised.
-
Required Access Mechanisms: Each data location may also require one of several specified access mechanisms. For instance, Google BigQuery and Google Cloud Storage can require either a 'Google Group Email' or a 'Research Environment' access mechanism, while AWS Open Data and AWS S3 require an 'S3' access mechanism. If none of the acceptable access mechanisms are found, a validation error is raised.
-
Forbidden Access Mechanisms: Finally, some data locations forbid certain access mechanisms. Specifically, the 'Direct' data location forbids the 'Google Group Email', 'S3', and 'Research Environment' access mechanisms. If any of these are present, a validation error is raised.
Quick Note about the interface This is so that we can quickly test if the validators work. and create datasources.