superset icon indicating copy to clipboard operation
superset copied to clipboard

[SIP-137] Can we limit a user to only read certain tables in the database connection?

Open ttdpro98 opened this issue 1 year ago • 9 comments

Please make sure you are familiar with the SIP process documented here. The SIP will be numbered by a committer upon acceptance.

[SIP-137] Proposal for limit a user to only read certain tables in the database connection

Motivation

We are allowing a number of users to create dashboards and query data on SQL Lab with the database connections that have been pre-configured by the Superset admin.

However, once a user is allowed access to the database connection, it means they can read all the data in that database, which we do not want. Can we limit a user to only read a specific schema or certain tables directly in Superset without having to create multiple connections to one database with different permissions?

Proposed Change

We can grant read permissions to users for individual schemas or tables directly on Superset. I believe this can be done if Superset can fetch schema metadata from the databases.

ttdpro98 avatar Jun 11 '24 03:06 ttdpro98

Delegating permissions to users who can read specific tables in the database is quite important. I've worked at 3 companies that use superset for data mining, and all have raised the same issue. I believe this improvement will make Superset a much more powerful data mining tool.

ttdpro98 avatar Jun 12 '24 09:06 ttdpro98

It might be worth filling out the other parts of the SIP template that are missing here. Otherwise, let me know if you need any help kicking off the [DISCUSS] thread on the Superset dev mailing list to move it forward.

rusackas avatar Jun 12 '24 17:06 rusackas

@rusackas as I read in 5602, I'm missing descriptions of New or Changed Public Interfaces, New dependencies, Migration Plan and Compatibility and Rejected Alternatives. However, I find these descriptions unnecessary. Could you tell me what information we should add to make it clearer?

ttdpro98 avatar Jun 13 '24 04:06 ttdpro98

CC @dpgaspar @yousoph for comments/consideration.

rusackas avatar Jun 18 '24 15:06 rusackas

Hi there!

If you take a look at the List Roles page, you should be able to set up data permissions for schemas and datasources: Image

There's some additional information here and here as well - does that address what you're looking to achieve?

yousoph avatar Jun 18 '24 19:06 yousoph

@yousoph I'm aware of that feature, but it is not useful for determining which tables a user can access when querying in SQL Lab

ttdpro98 avatar Jun 20 '24 08:06 ttdpro98

This is something that should be managed at the db level (using different users), which I know you said was something you didn't want to do. Otherwise data source permissions is the only thing that remotely fits.

Overall I think if you are going to give people permission to use sqllab, they need to be trusted with whatever data that user can access in the database. There's really no difference between sqllab and any other SQL IDE/workbench.

andy-clapson avatar Jun 21 '24 13:06 andy-clapson

@andy-clapson This is not quite right. For querying on the IDE, we use personal accounts to query the database.

But on Superset, we query the database and create datasets to draw dashboards. Therefore, we cannot use personal-account for each dashboard when querying the database. We need to use a service account to represent Superset in querying the database.

And I also believe that the dashboards on Superset are a production environment. We are not allowed to use personal-account for the dashboards, and we cannot rely on the assumption that people never make mistakes. We need to avoid situations of human error.

ttdpro98 avatar Jun 24 '24 04:06 ttdpro98

I think my point still applies then - if you are letting people query production databases, they should be trusted with that data (to the extent of what that particular db user allows them to access). If this doesn't fit, I think you are stuck using data sources as your control layer in some way, and SQLlab access is off-limits.

andy-clapson avatar Jun 24 '24 13:06 andy-clapson

I think it's safe to say we need more detail in the proposal... we need to better understand the exact implementation here... how it's manifested in the UI, the default access, how it interacts with datasource permissions, etc. If this happens to be something you already have working on a fork, a Draft PR might help move the conversation forward as well.

rusackas avatar Jul 02 '24 15:07 rusackas

Hi there. Checking in. If we don't get more details or a complete template and technical proposal, this SIP is likely to be closed as incomplete/inactive.

rusackas avatar Aug 28 '24 16:08 rusackas

Hello,

I'm having the same problem using apache pinot in superset. Case study: A hundred clients with different rights access SuperSet. Superset rights management is used to limit access to the view. But as soon as the client looks at the network requests, they can retrieve the URL, account name and password from the database to reuse it and gain full access to the data.

In this case, we have an Internet accessible solution, but I don't want my database to have the same access.

Solution : Pass requests to the superset backend before reaching the databases + add a security layer based on superset roles. The databases will then not be accessible directly.

As it is, I can't use superset for my project, even though the solution meets my needs perfectly.

Thank you advance,

inctl avatar Sep 05 '24 09:09 inctl

@rusackas we have another case study, let's examine it

ttdpro98 avatar Sep 05 '24 09:09 ttdpro98

Pinging @dpgaspar for input here... we still need more details on the technical approach here, but if there's a security issue at hand, he should be aware, and we have the right to close/wipe this issue from public view. Please note that all security concerns should not be made public, but should be reported directly to [email protected].

rusackas avatar Sep 10 '24 16:09 rusackas

@inctl please report to [email protected] with further details, follow https://github.com/apache/superset/security/policy

dpgaspar avatar Sep 11 '24 10:09 dpgaspar

@dpgaspar should we close this or leave it open?

rusackas avatar Nov 06 '24 17:11 rusackas

This SIP hasn't moved forward in terms of ASF/SIP process (i.e. a Discussion email thread) and might be closed in time, so I'm wondering how to carry the idea forward. If there are requests for increased specificity of the permissions in the security model, I would propose that we close this SIP, and instead carry the requirement over to the discussion on [SIP-131] Superset Security Model Redesign. Would that seem appropriate?

rusackas avatar Dec 04 '24 17:12 rusackas

Hi guys, I have sketched out a basic idea of a role-based model on Superset and am looking for the best way to describe it clearly. I will send it to you in the next two weeks. Should I post it on this GitHub issue or through another channel? Please suggest.

@rusackas

ttdpro98 avatar Dec 13 '24 15:12 ttdpro98

@ttdpro98 that depends on how big of a change it is. @mistercrunch is proposing a new permissions model over here. Maybe you want to add your thoughts and use cases over there?

rusackas avatar Dec 13 '24 16:12 rusackas

Some comments here, first I think the main topic here, given that we do have the permission model and logic to support dataset restrictions in Explore/Dashboard, is something more like "enforcing schema-level and physical dataset-level access in SQL Lab". Currently - and I don't know if it's well documented today - SQL Lab data access-level policy is limited to "Database connection", meaning that if you have access to the database connection AND have SQL Lab access, you can run whatever queries you want. We may need to clarify this in the docs.

Now, enforcing the rules we have around schema AND dataset-level access in SQL Lab is tricky because it requires parsing table/view names out of SQL Lab user-defined SQL, and checking against the perms. Note that we do have methods that are pretty well tested in the codebase to do that, but anything that relies on parsing SQL (in this case to extract "relations" (ie views and tables) names and sometimes guessing their schema) is a potential security risk.

Today it'd be fairly easy to add something that uses methods we have already to extracts the relations name and checks against schema-level and relations-level perms on the SQL Lab side of the house, but it's not impossible someone could find a way to work around that. Maybe they could achieve that with obfuscated SQL, through some database-specific UDTFs-like constructs, or by creating views (if they're allowed to do that) in a schema they have access to that would be pointing to another schema they don't have access to.

Now, given that it'd be possible to build something that solves for this but isn't bulletproof, my recommendation would be to put this behind a feature flag, call it "DANGEROUS__TRY_TO_ENFORCE_DATA_ACCESS_POLICY_IN_SQL_LAB", with the proper disclaimer, something like:

# This feature flag tries to enforce schema-level and physical-dataset access defined for Explore/Dashboard in SQL LAB
# this is done by parsing the user-defined SQL and extracting/infereing schema and table/view names out of that SQL using 
# sqlglot/sqlparse. While this should just work in most cases, it may be possible to use this as an attack vector
# using some SQL trickery. For instance if a user has DDL access, they could run a CREATE VIEW in a schema 
# they have full access to against another schema they don't have access to to bypass the policies in place
DANGEROUS__ENFORCE_DATA_ACCESS_POLICY_IN_SQL_LAB: False,

mistercrunch avatar Dec 13 '24 19:12 mistercrunch

If my interpretation of the use case is right, we should rename the sip to "enforcing schema-level and physical dataset-level access in SQL Lab" or similar

mistercrunch avatar Dec 13 '24 19:12 mistercrunch

About the current state of things and whether it needs clearer documentation, doing a bit more thinking, I think today, for someone to use SQL Lab they need access to the database connection, which implies that they have access to everything inside that database connection. Now it's possible for someone to give access to a db connection and define schema-level AND dataset-level access, which I believe is disregarded since the database-access-level permission overrides/superseeds that.

mistercrunch avatar Dec 13 '24 19:12 mistercrunch

Pointers to relevant code ->

  • parsing SQL -> https://github.com/apache/superset/blob/master/superset/sql/parse.py#L161
  • potential area to implement the feature flag / logic -> https://github.com/apache/superset/blob/master/superset/sql_lab.py#L211-L250, here I'd recommend factoring out some sort of validate or check_data_access_permissions method and putting it in the right place, maybe could live in the SecurityManager so that people could implement their own logic

mistercrunch avatar Dec 13 '24 19:12 mistercrunch

About the current state of things and whether it needs clearer documentation, doing a bit more thinking, I think today, for someone to use SQL Lab they need access to the database connection, which implies that they have access to everything inside that database connection. Now it's possible for someone to give access to a db connection and define schema-level AND dataset-level access, which I believe is disregarded since the database-access-level permission overrides that.

Very much my $0.02 here. Docs could always be better for everything, everywhere, of course. But IMO it feels weird to spend time/effort expanding Superset's features to include controls for a SQL IDE, which is really just a user connecting to a DB, and up to the DBA and whatever permissions model is implemented at the organization. Feels outside the scope of a bi tool, certainly.

I always thought of SQLLab as "cool, there is an IDE here I can use, that's nice to have". If the use case was to allow certain users ONLY access to existing Superset datasets, that would make a bit more sense.

andy-clapson avatar Dec 13 '24 19:12 andy-clapson

Yeah, unclear if a SQL IDE should even attempt to do this, though I can see the use case. Like i have a set of say 3 roles:

  • data engineer - all access
  • data scientist - subset of schemas
  • data consumers - another subset of schemas

These people all know how to write SQL and need to write SQL as part of their job, and I want for that schema-access enforcement carries through Superset.

Now today what you'd do is probably create 3 different database connections, each with a different service account with schema-level restrictions, and give access to each role independently. It does kind of get messy where you may have to use different connections for explore if people want to share/collab on certain datasets. While the physical dataset may be the same, Superset would see them as different datasets as they use different connections. For that, you'd probably need a 4th database connection that like "MAIN CONNECTION - with data access policy" on top of the "DATA SCIENTIST DATABASE CONNECTOIN FOR SQL LAB". It gets kind of messy/confusing quick, but in theory would work.

mistercrunch avatar Dec 13 '24 19:12 mistercrunch

This one seems to have gone quiet for 4 months, and still hasn't been brought up for a [DISCUSS] thread. I'll close it as abandoned for now, but if anyone wants to usher it through, we can certainly reopen it and continue the process.

rusackas avatar Apr 30 '25 17:04 rusackas