dataall icon indicating copy to clipboard operation
dataall copied to clipboard

Implement metadata forms

Open voidwisp opened this issue 1 year ago • 16 comments

AWS DataZone has a feature called: Metadata Forms. Essentially it's a dynamic form creator that can be attached to datasets. I think this is an excellent feature and it would help data.all solve some problems.

One problem is that at the moment we don't know what metadata users want to keep on their datasets. Currently data.all hardcodes Confidentiality and Topics. For our use case for example Topics make no sense and for Confidentiality we use our own levels.

I propose the following:

  1. Create a new Play module Metadata Forms
  2. Forms are owned by teams who created them
  3. Forms should be visible to teams who created them but should also have an option to be shared with everyone
  4. Forms should allow to require them to be used on all data.all datasets without exception. These would be very useful for organization wide metadata like confidentiality levels. This option should only be available to DA admins.
  5. Datasets should allow to attach from 1 to N metadata forms.
  6. Metadata forms should allow to create fields: strings, integers, booleans and most importantly using Glossary Terms as this allows to create specific dropdowns. Using glossary terms I can create a required form for my own confidentiality levels.
  7. (Optional) Allow to specify required forms on Organization level or Environment level. This way an environment or organization can enforce certain metadata forms are always used on their datasets.
  8. Form fields which were set as required by DAAdmin should be searchable as that way we can search by things like confidentiality level.
  9. Remove confidentiality and topics dropdowns from datasets. Offer a migration path to users who already use it.
  10. Refactor confidentiality with new checkboxes on datasets: 1) hide schema without having access etc.. Instead of hiding the meaning behind a confidentiality level make it explicit with checkboxes what is allowed on a dataset.

voidwisp avatar Feb 19 '24 12:02 voidwisp

Hi @zsaltys - I think the above idea is one that we would want to pick up for a future release of data.all as it would be very helpful for more dynamic metadata associations on data.all resources. I think this would be a good candidate to extend the Discover Section of the UI rather than the Play section which focuses more on data consumption use cases

Additionally, I would be curious to hear your thoughts on if you see a benefit of extending this Metadata Forms concept to more than just Datasets but also to other resources in data.all, such as Notebooks, Worksheets, etc. I think offering this type of flexibility everywhere could have value but curious to hear other opinions here

We will add this issue to our backlog for now and determine when we can prioritize

noah-paige avatar Feb 20 '24 16:02 noah-paige

For 2.6 we will work on the design, implementation will be carried out in 2.7

dlpzx avatar Jun 11 '24 15:06 dlpzx

Overview Purpose: Create, manage, and attach metadata forms to datasets. Ownership: Forms are owned by the teams that create them. Visibility: Forms are visible to the creating team with options to share them environment-wide, organization-wide, global. Enforcement: DA admins can enforce the use of specific forms environment-wide, organization-wide, global.

SofiaSazonova avatar Jun 20 '24 15:06 SofiaSazonova

Metadata Properties

Form Name Type Description Example Value
Form Name String The name of the metadata form. Confidentiality Level
Form Description String A brief description of the form’s purpose. Form to specify confidentiality levels.
Owner Team String The team responsible for the form. Data Governance Team
Visibility Dropdown Visibility options for the form.
Values: Team Only, Environment-Wide, Organization-Wide, Global.
Team Only
Enforce Use Checkbox (Admin) Enforce the form on all datasets organization-wide. Checked

SofiaSazonova avatar Jun 20 '24 15:06 SofiaSazonova

Field Configuration

Field Name Type Description Example
Field Name String The name of the field. Confidentiality Level
Field Type Dropdown The type of field.
Values: String, Integer, Boolean, Glossary Term.
Glossary Term
Required Checkbox Whether the field is required. Checked
Glossary Term Glossary Picker Allows selection of a glossary term for dropdown values. Available only if Field Type is Glossary Term. Select Term
Possible Values Array Possible values. If not set, the value can be any. Glossary term values populated automatically. High, Medium, Low

SofiaSazonova avatar Jun 20 '24 15:06 SofiaSazonova

@anmolsgandhi I absolutely think that this should be extended further than just datasets. This would be very useful for environments too. But I think yes we should make it extendable to any entity. I don't personally use other things in the Play area besides worksheets.. I don't see how it could be useful for worksheets though.

But I would definitely want this for datasets, environments and probably organizations too.

@SofiaSazonova @dlpzx @anmolsgandhi I think it's fine if first iteration we design it with datasets in mind but we should keep doors open for attaching to other entities like envs and orgs. This is especially important for the "enforce use" scenario.

For enforcement the way I think it should work is that:

a) DA admin can enforce for everyone regardless if DA admin team is part of the org/env/dataset teams.. b) otherwise enforcement should only be possible if the team who is enforcing owns the entity. Meaning I can only enforce something on ORGs that have the same owner team as the metadata form. Same for environments and datasets but I think there should also be a hierarchy for example if the team who created the ORG is the team who is creating an enforcing metadata form for all datasets then it should work because the org admin team indirectly is the owner of all datasets for the org. Hope this makes sense.

voidwisp avatar Jun 21 '24 10:06 voidwisp

@zsaltys Thanks for the above guidance, i do think this feature should be extended further and we will ensure to design it that way. At this moment, we are all hands on deck for v2.6 release but we will work with you more closely on this one. cc @SofiaSazonova

anmolsgandhi avatar Jul 01 '24 15:07 anmolsgandhi

Below there is a resume of the discovery phase. I focused on the implementation of Metadata Forms (MF), and left out of the scope for now, requirements for the data.all changes: Refactor confidentiality and etc.

@zsaltys @anmolsgandhi @dlpzx @noah-paige Comments are welcomed!

SofiaSazonova avatar Jul 18 '24 15:07 SofiaSazonova

Main goal: Create an effective instrument for data consistency improvement.

Requirements

  1. Metadata Forms must be developed as a Discovery module for Data.all
  2. Data.all users should have opportunity to create, edit and delete metadata form.
  3. Metadata Forms must have different visibility levels: from private to global.
  4. Metadata forms can be attached to:
    1. Organizations
    2. Organization Teams
    3. Environments
    4. Environment Teams
    5. Datasets
    6. Worksheets
    7. Dashboards
    8. Consumption roles
    9. Notebooks
    10. ML Studio entities
    11. Pipelines
  5. Metadata Forms can be obligatory to use
  6. For each entity there can be multiple forms attached
  7. Form fields should be searchable

SofiaSazonova avatar Jul 18 '24 15:07 SofiaSazonova

Structure

Table S.1. Metadata Form Properties

Field Description Type Possible Values
Metadata Form URI Identifier of metadata form String URI of metadata form
Form Name The name of the metadata form. String
Form Description A brief description of the form’s purpose. String
Owner Team The team responsible for the form. String Team URI
Visibility Visibility options for the form. String Team Only, Environment-Wide, Organization-Wide, Global.
Home Entity URI of Org/Env. If Visibility is Global or Team-only, this field is None String? URI of Org/Env or None

Table S.2. Metadata Form Field Configuration

Field Description Type Possible Values
Metadata Form Identifier of metadata form String URI of metadata form
Field URI Identifier of the field String
Field Name The name of the field. String
Field Type The type of field. String String, Integer, Boolean, Glossary Term.
Required Whether the field is required. Boolean
Glossary Term Allows selection of a glossary term for dropdown values. Available only if Field Type is Glossary Term. String URI of Glossary Term
Possible Values Possible values. If not set, the value can be any. Glossary term values populated automatically. Array<T>?

Table S.3. Filled Metadata Form

Field Description Type Possible Values
Metadata Form Identifier of metadata form String URI of metadata form
Filled form URI Identifier of filled form String URI
Entity Identifier of Entity that is form attached to String URI
Entity Type Type of entity String Table E.2

Table S.4. Filled Metadata Form Filed

Field Description Type Possible Values
Field URI String
Filled form URI String
Value Polymorphism implemented Type of field

Enforcement

Metadata forms can be obligatory to fill in on different levels. User can select the metadata form and entity types, that should have this form attached. Enforcement affects selected entity types on all lower levels hierarchically.

Table E.1. Metadata form enforcement rule

Field Description Type Possible Values
Metadata Form Identifier of metadata form, that is enforced String URI of existing form
Level Level of affected entities. All lower levels will be affected. String From table E.2
Entity Types Entity type affected by this enforcement Array [String] Identifiers of data.all entities types (table E.2)
Severity String Obligatory/Recommended

Table E.2. Metadata form levels’ hierarchy

meta levels

Who can enforce:

  • Data.all admins can enforce any form on any level across the platform. They have full control over metadata form enforcement.
  • Owners/admins of the data can enforce forms for this levels and levels below in the hierarchy. For example, an org admin can enforce a form for the org, all teams in that org, all environments in the org, all datasets in those environments, etc.
  • Share approvers and requestors can enforce forms for a specific share they are involved with. However, they can only delete enforcement rules they created themselves - they cannot delete rules created by others

So in summary, enforcement capabilities cascade along with administrative privileges in the hierarchy. Global admins have full control, org/env admins can enforce for their sphere and below, dataset admins for the datasets and items in it, and share requesters and approvers for a specific share.

SofiaSazonova avatar Jul 18 '24 15:07 SofiaSazonova

MF DB

SofiaSazonova avatar Jul 22 '24 10:07 SofiaSazonova

Re-opening this issue - think was accidently auto closed with the merge of #1422

noah-paige avatar Jul 23 '24 16:07 noah-paige

@SofiaSazonova I like everything I see, I like the enforcement mechanisms. One I would like to bring up potentially as next enhancement later on is default values.

Let us imagine a use case where I as DA admin define a global metadata form with 30 fields I want all my datasets to answer. Such as: Is this PII data? Is this GDRP data? JIRA link? DEV contact? DevOps Contact? Domain? Subdomain? Now many of these on the organization level would be very repetetive to define. For example let's say I have organization Sales and I have 30 datasets. Well the domain for us is always SALES so there's really no point to keep asking every dataset creator to define this on every import in fact maybe I want to enforce that in my organization the domain is ALWAYS sales. I would imagine this feature as "metadata form defaults". The way it would work if a metadata form is attached to an org for all their datasets then that org should be able to provide their defaults for that required metadata form.

@anmolsgandhi fyi

voidwisp avatar Aug 08 '24 17:08 voidwisp

@zsaltys @TejasRGitHub @anushka-singh Here is the demo. Not all of this is merged, but it will soon)

https://github.com/user-attachments/assets/cb4e1aed-e75a-4c9d-8086-9d417b026935

SofiaSazonova avatar Aug 19 '24 16:08 SofiaSazonova

Thanks for this demo video @SofiaSazonova It looks really good and user friendly! I just had one comment. I believe metadata forms should be attached at dataset level and not org level because we are filling in metadata for datasets right? One org can have multiple datasets and each dataset can have a different metadata form.

Also is there a way to make metadata form mandatory to be filled out? It might be a useful feature for some customers.

anushka-singh avatar Aug 19 '24 17:08 anushka-singh

@anushka-singh

I believe metadata forms should be attached at dataset level and not org

By design (in previous messages) it can be attached at many levels. As we discussed above, we shouldn't restrict MFs only to datasets.

Also is there a way to make metadata form mandatory to be filled out

Yes, It is also described in the design. It will be implemented in next step.

SofiaSazonova avatar Aug 20 '24 09:08 SofiaSazonova