Implement metadata forms
AWS DataZone has a feature called: Metadata Forms. Essentially it's a dynamic form creator that can be attached to datasets. I think this is an excellent feature and it would help data.all solve some problems.
One problem is that at the moment we don't know what metadata users want to keep on their datasets. Currently data.all hardcodes Confidentiality and Topics. For our use case for example Topics make no sense and for Confidentiality we use our own levels.
I propose the following:
- Create a new Play module Metadata Forms
- Forms are owned by teams who created them
- Forms should be visible to teams who created them but should also have an option to be shared with everyone
- Forms should allow to require them to be used on all data.all datasets without exception. These would be very useful for organization wide metadata like confidentiality levels. This option should only be available to DA admins.
- Datasets should allow to attach from 1 to N metadata forms.
- Metadata forms should allow to create fields: strings, integers, booleans and most importantly using Glossary Terms as this allows to create specific dropdowns. Using glossary terms I can create a required form for my own confidentiality levels.
- (Optional) Allow to specify required forms on Organization level or Environment level. This way an environment or organization can enforce certain metadata forms are always used on their datasets.
- Form fields which were set as required by DAAdmin should be searchable as that way we can search by things like confidentiality level.
- Remove confidentiality and topics dropdowns from datasets. Offer a migration path to users who already use it.
- Refactor confidentiality with new checkboxes on datasets: 1) hide schema without having access etc.. Instead of hiding the meaning behind a confidentiality level make it explicit with checkboxes what is allowed on a dataset.
Hi @zsaltys - I think the above idea is one that we would want to pick up for a future release of data.all as it would be very helpful for more dynamic metadata associations on data.all resources. I think this would be a good candidate to extend the Discover Section of the UI rather than the Play section which focuses more on data consumption use cases
Additionally, I would be curious to hear your thoughts on if you see a benefit of extending this Metadata Forms concept to more than just Datasets but also to other resources in data.all, such as Notebooks, Worksheets, etc. I think offering this type of flexibility everywhere could have value but curious to hear other opinions here
We will add this issue to our backlog for now and determine when we can prioritize
For 2.6 we will work on the design, implementation will be carried out in 2.7
Overview Purpose: Create, manage, and attach metadata forms to datasets. Ownership: Forms are owned by the teams that create them. Visibility: Forms are visible to the creating team with options to share them environment-wide, organization-wide, global. Enforcement: DA admins can enforce the use of specific forms environment-wide, organization-wide, global.
Metadata Properties
| Form Name | Type | Description | Example Value |
|---|---|---|---|
| Form Name | String | The name of the metadata form. | Confidentiality Level |
| Form Description | String | A brief description of the form’s purpose. | Form to specify confidentiality levels. |
| Owner Team | String | The team responsible for the form. | Data Governance Team |
| Visibility | Dropdown | Visibility options for the form. Values: Team Only, Environment-Wide, Organization-Wide, Global. |
Team Only |
| Enforce Use | Checkbox (Admin) | Enforce the form on all datasets organization-wide. | Checked |
Field Configuration
| Field Name | Type | Description | Example |
|---|---|---|---|
| Field Name | String | The name of the field. | Confidentiality Level |
| Field Type | Dropdown | The type of field. Values: String, Integer, Boolean, Glossary Term. |
Glossary Term |
| Required | Checkbox | Whether the field is required. | Checked |
| Glossary Term | Glossary Picker | Allows selection of a glossary term for dropdown values. Available only if Field Type is Glossary Term. | Select Term |
| Possible Values | Array | Possible values. If not set, the value can be any. Glossary term values populated automatically. | High, Medium, Low |
@anmolsgandhi I absolutely think that this should be extended further than just datasets. This would be very useful for environments too. But I think yes we should make it extendable to any entity. I don't personally use other things in the Play area besides worksheets.. I don't see how it could be useful for worksheets though.
But I would definitely want this for datasets, environments and probably organizations too.
@SofiaSazonova @dlpzx @anmolsgandhi I think it's fine if first iteration we design it with datasets in mind but we should keep doors open for attaching to other entities like envs and orgs. This is especially important for the "enforce use" scenario.
For enforcement the way I think it should work is that:
a) DA admin can enforce for everyone regardless if DA admin team is part of the org/env/dataset teams.. b) otherwise enforcement should only be possible if the team who is enforcing owns the entity. Meaning I can only enforce something on ORGs that have the same owner team as the metadata form. Same for environments and datasets but I think there should also be a hierarchy for example if the team who created the ORG is the team who is creating an enforcing metadata form for all datasets then it should work because the org admin team indirectly is the owner of all datasets for the org. Hope this makes sense.
@zsaltys Thanks for the above guidance, i do think this feature should be extended further and we will ensure to design it that way. At this moment, we are all hands on deck for v2.6 release but we will work with you more closely on this one. cc @SofiaSazonova
Below there is a resume of the discovery phase. I focused on the implementation of Metadata Forms (MF), and left out of the scope for now, requirements for the data.all changes: Refactor confidentiality and etc.
@zsaltys @anmolsgandhi @dlpzx @noah-paige Comments are welcomed!
Main goal: Create an effective instrument for data consistency improvement.
Requirements
- Metadata Forms must be developed as a Discovery module for Data.all
- Data.all users should have opportunity to create, edit and delete metadata form.
- Metadata Forms must have different visibility levels: from private to global.
- Metadata forms can be attached to:
- Organizations
- Organization Teams
- Environments
- Environment Teams
- Datasets
- Worksheets
- Dashboards
- Consumption roles
- Notebooks
- ML Studio entities
- Pipelines
- Metadata Forms can be obligatory to use
- For each entity there can be multiple forms attached
- Form fields should be searchable
Structure
Table S.1. Metadata Form Properties
| Field | Description | Type | Possible Values |
|---|---|---|---|
| Metadata Form URI | Identifier of metadata form | String | URI of metadata form |
| Form Name | The name of the metadata form. | String | |
| Form Description | A brief description of the form’s purpose. | String | |
| Owner Team | The team responsible for the form. | String | Team URI |
| Visibility | Visibility options for the form. | String | Team Only, Environment-Wide, Organization-Wide, Global. |
| Home Entity | URI of Org/Env. If Visibility is Global or Team-only, this field is None | String? | URI of Org/Env or None |
Table S.2. Metadata Form Field Configuration
| Field | Description | Type | Possible Values |
|---|---|---|---|
| Metadata Form | Identifier of metadata form | String | URI of metadata form |
| Field URI | Identifier of the field | String | |
| Field Name | The name of the field. | String | |
| Field Type | The type of field. | String | String, Integer, Boolean, Glossary Term. |
| Required | Whether the field is required. | Boolean | |
| Glossary Term | Allows selection of a glossary term for dropdown values. Available only if Field Type is Glossary Term. | String | URI of Glossary Term |
| Possible Values | Possible values. If not set, the value can be any. Glossary term values populated automatically. | Array<T>? |
Table S.3. Filled Metadata Form
| Field | Description | Type | Possible Values |
|---|---|---|---|
| Metadata Form | Identifier of metadata form | String | URI of metadata form |
| Filled form URI | Identifier of filled form | String | URI |
| Entity | Identifier of Entity that is form attached to | String | URI |
| Entity Type | Type of entity | String | Table E.2 |
Table S.4. Filled Metadata Form Filed
| Field | Description | Type | Possible Values |
|---|---|---|---|
| Field URI | String | ||
| Filled form URI | String | ||
| Value | Polymorphism implemented | Type of field |
Enforcement
Metadata forms can be obligatory to fill in on different levels. User can select the metadata form and entity types, that should have this form attached. Enforcement affects selected entity types on all lower levels hierarchically.
Table E.1. Metadata form enforcement rule
| Field | Description | Type | Possible Values |
|---|---|---|---|
| Metadata Form | Identifier of metadata form, that is enforced | String | URI of existing form |
| Level | Level of affected entities. All lower levels will be affected. | String | From table E.2 |
| Entity Types | Entity type affected by this enforcement | Array [String] | Identifiers of data.all entities types (table E.2) |
| Severity | String | Obligatory/Recommended |
Table E.2. Metadata form levels’ hierarchy
Who can enforce:
- Data.all admins can enforce any form on any level across the platform. They have full control over metadata form enforcement.
- Owners/admins of the data can enforce forms for this levels and levels below in the hierarchy. For example, an org admin can enforce a form for the org, all teams in that org, all environments in the org, all datasets in those environments, etc.
- Share approvers and requestors can enforce forms for a specific share they are involved with. However, they can only delete enforcement rules they created themselves - they cannot delete rules created by others
So in summary, enforcement capabilities cascade along with administrative privileges in the hierarchy. Global admins have full control, org/env admins can enforce for their sphere and below, dataset admins for the datasets and items in it, and share requesters and approvers for a specific share.
Re-opening this issue - think was accidently auto closed with the merge of #1422
@SofiaSazonova I like everything I see, I like the enforcement mechanisms. One I would like to bring up potentially as next enhancement later on is default values.
Let us imagine a use case where I as DA admin define a global metadata form with 30 fields I want all my datasets to answer. Such as: Is this PII data? Is this GDRP data? JIRA link? DEV contact? DevOps Contact? Domain? Subdomain? Now many of these on the organization level would be very repetetive to define. For example let's say I have organization Sales and I have 30 datasets. Well the domain for us is always SALES so there's really no point to keep asking every dataset creator to define this on every import in fact maybe I want to enforce that in my organization the domain is ALWAYS sales. I would imagine this feature as "metadata form defaults". The way it would work if a metadata form is attached to an org for all their datasets then that org should be able to provide their defaults for that required metadata form.
@anmolsgandhi fyi
@zsaltys @TejasRGitHub @anushka-singh Here is the demo. Not all of this is merged, but it will soon)
https://github.com/user-attachments/assets/cb4e1aed-e75a-4c9d-8086-9d417b026935
Thanks for this demo video @SofiaSazonova It looks really good and user friendly! I just had one comment. I believe metadata forms should be attached at dataset level and not org level because we are filling in metadata for datasets right? One org can have multiple datasets and each dataset can have a different metadata form.
Also is there a way to make metadata form mandatory to be filled out? It might be a useful feature for some customers.
@anushka-singh
I believe metadata forms should be attached at dataset level and not org
By design (in previous messages) it can be attached at many levels. As we discussed above, we shouldn't restrict MFs only to datasets.
Also is there a way to make metadata form mandatory to be filled out
Yes, It is also described in the design. It will be implemented in next step.