Grainular sharing using Lake Formation
Customer has enabled granular (Row, Column, Cell) sharing using lake formation sharing and would like to see that capability in data.all sharing request, and in approval area. So that data owners and share the same datasets with restricted access to columns with out having to create duplicate another dataset and data.
Customer current solution is to use data filters and typing in a Manuel expression, this may be the best open a text area to add expression, but best user experience would show column names in a visual checkbox way with ability to put in expressions for rows and cells.
https://docs.aws.amazon.com/lake-formation/latest/dg/data-filters-about.html
Hi @chamiles, for column level sharing we opened an issue a while ago. But there was no much interests (#84 ), instead we opted to implement Tag-based access control using Lake Formation, which also allows column level granularity. It is a feature in-progress at the moment developed in this issue #186
For row/cell level filtering we can work together on implementing this sort of data filters in shared tables.
Bumping this up, This issue will be used to track the enhancement and extension of the current data.all sharing functionality to include column-level and row-level access control in Lake Formation. This has been one of the top requests from customers based on our conversations. This will be included as a feature enhancement for completion in v2.7.
@anmolsgandhi I think it is important we get the UI right to make it clear that this can only for table sharing and if you grant S3 access then you're effectively breaking this security measure because user will have access to the entire bucket and dataset behind the table. Let's discuss this.
Hi @zsaltys - I am starting to work on design for this and can start sharing some mock ups of the UI here as I continue to progress this week
I agree some type of warning or call out from FE would be helpful to ensure user understands what they are doing when sharing data objects - but I think we can add warnings more so from the bucket sharing side because sharing an entire bucket is akin to sharing ALL folders and tables in data.all
I will post additional design description on this issue today
Design
Assumptions
- Dataset Owners will be the team that is responsible for creating data filters and applying them to shares
- Data Filters will only be supported for S3 Dataset Table Share Items
- Note: Data Filters are backed by LakeFormation Data Filters which is currently only used in S3 Dataset Table Sharing in data.all
- data.all will not support Cell-Level Filters but will allow one or more Column-Level or Row-Level filters to be applied to the same table share
- By default, shares of tables will be for FULL Access - dataset owners must add filters to a table share to filter the data shared
- To update from full access to filtered access a user must revoke and re-share the table
User Experience
Creating a Data Filter
-
User on the Dataset Owner Team navigates to a particular S3 Dataset Table
- There will be a tab for
Data Filterswhere only the Dataset Owners can view - On the
Data FiltersTab a User can:- View a table containing all of the existing Data Filters specific to that particular table
- Delete data filters from the table (as long as data filter is not in use in any active shares)
- Open a modal to create a new data filter for the specific
- There will be a tab for
-
Creating a new Data Filter
- User Provides a Filter Name and Description
- User can select either
ColumnorRowFilter Type - For
ColumnFilters --> User will be able to select a subset of columns to include in the filter (all columns not selected are excluded) - For
RowFilters --> User will be able to create a row expression to filter by row values- Each "simple" expression contains a
columnName+operator(i.e.=.<, etc.) +value - "simple" expressions can be stitched together to create composite expressions
- Supported Operators include
- For Nums:
=, !=, >, >=, <, <=, IS NULL, NOT NULL, IN, NOT IN - For Strings:
=, !=, LIKE, NOT LIKE, IS NULL, NOT NULL, IN, NOT IN - For Bools:
=, !=, IS NULL, NOT NULL
- For Nums:
- Each "simple" expression contains a
Applying Data Filter to a Share
- Data Consumer opens a share request for an S3 Dataset including one or more tables form that dataset and submits
- Dataset Owners have the option to assign 1 or more data filters for each table share item requested before approving the share
- Share View will include a new column to reflect what data filters (if any) are associated to the table items
- Dataset Owners will also have a easy way to navigate to the Data Filter View from the Share View and back to easily create new data filters and add to requested shares
Additional Considerations
- Share Re-apply and verify will be updated to support the additional required steps to integrate with data filters for table share item processing
Mock Ups of FE Design
List Data Filters View [IMAGE]
Delete Data Filter [VIDEO]
https://github.com/user-attachments/assets/3dae0bd2-916f-4f26-bce1-67af5ad4d1da
Create a New Data Filter [VIDEO]
https://github.com/user-attachments/assets/b6088731-527e-4c8e-a403-faf2d5dd3273
Attaching Data Filter(s) to Share [VIDEO]
https://github.com/user-attachments/assets/9414c99f-43c7-49b2-8dba-266d62a197e1
Progress Tracker
-
Frontend
- Data Filter CRUD (In Progress)
- [x] List Data Filters
- [x] Create New Data Filter
- [x] Delete Data Filter
- Shares
- [x] Assign Data Filters to Table Share Items UI
- [x] Share Item Table View w/ Filters
- [x] Route Approver to/from Data Filter View and Share View
- Data Filter CRUD (In Progress)
-
Backend
-
Data Filter CRUD (In Progress)
- [x] Create Data Filter API
- [x] Delete Data Filter API + Throw Exception if Existing Shares
- [x] List Data Filters API
- [x] Attach Permissions on New Table Creation
- [x] Delete All Filters on Table Delete
- [x] Throw Exception on Delete Filter if Existing Shares
- [x] Remove Permission on Table Deletes
-
DB (In Progress)
- [x] Create Data Filter Table
- [x] Add Data Filter Column to Share Items
-
Dataset Shares
- [x] Add Data Filters to Share Item Record (Tables only)
- [x] S3 Dataset Table Share Processor Updates (View Comment Below for more Detail!)
- [x] Share Approve
- [x] Share Revoke
- [x] Share Verify
- [x] Share ReApply
-
-
Additional Work
- [x] Worksheet Query Resolve Resource Link Name
- [x] (To Discuss) Table Previews / Column Views / Worksheet Column List for Requestors with Filtered Table Access
- [x] Backwards Compatibility with Existing Table Shares
- [x] DB Backfill Dataset Owners + Stewards Data Filter Permissions
- [x] Write Tests!
- [ ] Warning on S3 Bucket Sharing w/ Table OR Folder Shares
Testing (MOVED TO PR COMMENTS)
Table Sharing Finding
Current Approach
Currently when we share a table cross account we follow the following steps:
0) Check if source account details are properly initialized and initialize the Glue and LF clients
1) Grant ALL permissions to pivotRole for source database in source account
2) Create the shared database in target account if it doesn't exist
3) Grant permissions to pivotRole and principals to "shared" database
4) For each shared table:
a) Update its status to SHARE_IN_PROGRESS with Action Start
b) Check if table exists on glue catalog raise error if not and flag share item status to failed
c) If it is a cross-account share:
c.1) Revoke iamallowedgroups permissions from table
c.2) Grant target account permissions to original table -> create RAM invitation
c.3) Accept pending RAM invitation
d) Create resource link for table in target account
e) If it is a cross-account share: grant permission to principals to RAM-shared table in target account
f) grant permission to principals to resource link table
g) update share item status to SHARE_SUCCESSFUL with Action Success
Most importantly - we re use the shared DB and the resource link table in the target account and then add additional grants for new principals who get approved access to shared data
Moving to Data Filters
When it comes to data filters - herein lies an issue because:
- (1) Different consumers in the target account may have varying levels of read access (i.e. different filters assigned to share)
- (2) Originally the data filters were to exist on the data producer account - however if we followed the current pattern of sharing table to external target account and then granting to specific principals within target account from there then the data filter presumably would exist in the target account (and would need to be created in each target account where it is used)
Looking for a Solutions
-
(Option 1 - NOT VIABLE) Sharing the table w/ assigned filters to the external account
- Filters still exist in source account and can assign to grant to external target account ID
- Works in restricting the end user IAM principals ability to see particular rows/columns
- However this goes against re-using shared DB / resource link for multiple cross-account consumers
-
(Option 2 - NOT VIABLE) Attempt to create Filters in Target account after resource link is created
- Received ERROR - “Error creating data cell filter: test. Creation of a cell filter on a view or resource link is not allowed.”
-
(Option 3) Sharing the table w/ assigned filters directly to the Foreign IAM Principal
- Filters still exist in source account and can assign to grant to external target IAM Principal Role ARN
- Works in restricting the end user IAM principals ability to see particular rows/columns
- Still requires a DB (to be parent resource of table) and a resource link (to enable querying in Athena)
- DB can be reused as same as created for shares with no filters
- Resource Link can have new naming convention
{table_name}_{filterUris}
If following alogn with Option 3 above - adding additional details here
Pre-Reqs
Cross-account grants made using the named resource method are compatible across different versions. Even if the grantor account is using an older version (version 1 or 2) and the recipient account is using a newer version (version 3 or higher), the cross-account access functionality operates seamlessly without any compatibility issues or errors.
To share resources directly with IAM principals in another account, only the grantor needs to use version 3.
https://docs.aws.amazon.com/lake-formation/latest/dg/optimize-ram.html
Comparison
In the proposed option - we do the same DB steps as before which is
- Create a DB in target account (if not exist)
- Grant permission to pivotRole and principal to the DB in target account
But instead of
- DESCRIBE, SELECT Permissions on table in source to target account id (grant_target_account_permissions_to_source_table)
- DESCRIBE, SELECT Permissions on table shared in target account to target share principals (IAM Roles)
- DESCRIBE Permissions on the resource link table in target account to target share principals (IAM Roles)
We do
- DESCRIBE, SELECT Permissions on table w/ ftilers in source directly to share principals in target (IAM Roles)
- DESCRIBE Permissions on the resource link table in target account to target share principals (IAM Roles)
Originally if TableX was shared to same cross account to GroupA and GroupB we would have
- 1 DB in target account
- 1 table resource link
- GroupA and GroupB IAM principals have access to resource link + source table
- Account has permissions to source table
Now if TableX was shared to same cross account to GroupA (w/ Filter1) and GroupB (w/ Filter2)
- 1 DB in target account
- 2 table resource links (with separate naming conventions)
- GroupA and GroupB IAM principals have access to their respective resource links + source table
- No account level grants
Errors
- When trying to share directly to QS Group in anotehr account get an error
- Using this approach Data Filter shares will not support shares to QuickSight
Could not grant principal QS_GROUP_ARN permissions ['DESCRIBE', 'SELECT'] and permissions with grant options None to {'TableWithColumns': {'DatabaseName': 'DB_NAME', 'Name': 'TABLE_NAME', 'ColumnWildcard': {}, 'CatalogId': 'SOURCE_ACCOUNT'}} due to: An error occurred (InvalidInputException) when calling the GrantPermissions operation: Cross account requests are only allowed for AWS Accounts, Organizations, IAM Principals and All IAMPrincipals
Hi @noah-paige, I love the UI views! Here are some remarks on the design and the table findings:
- Should we introduce any type of guardrail for filters? For example, column filters need to include more than 1 column and row filters should be possible (e.g. col1=>0 and col1<0). Maybe it is too difficult this second one
- I think option 3 seems like the best approach and it might be needed either way when we get into LF tags. The concern with Quicksight.... The first thing that comes to mind is that "well we can make it configurable" but then, we need to maintain 2 different ways of sharing data. Another option is to treat QS groups not as an indivisible entity of the data.all group but as a consumption principal (the same way we have consumption roles). But that is a whole new concept that would double the scope of this feature
Findings from testing:
-
Assigning 2 Column Filters
- Outer join of the columns lists are shown
- All other columns not in either column filter list are excluded and not shown
-
Assigning 1 Row and 1 Column Filter
- Shows every row for the allowed columns via column filter criteria
- Also shows all column values for rows that meet the row filter criteria
- For rows that do not meet row filter criteria - the non listed columns included in column filter are set to None
-
Assigning 2 Row Filters
- All rows the meet 1 or more of the row filters are shown
- If a row does not meet either filter expression it is not shown