[DISCUSSION] Should we implement access control hooks in DataFusion?
Is your feature request related to a problem or challenge?
@comphead brings up an excellent point here: https://github.com/apache/arrow-datafusion/pull/7441#issuecomment-1698341294
Basically many database systems allow some sort of access control such as allowing some users the ability to read data but restricting writing.
There are many different granularities of such controls (like restricted to schemas, tables, read schema vs read data, etc) that a system might want to implement.
Describe the solution you'd like
If anyone needs this today, they can build it on top of LogicalPlan (by checking LogicalPlan contents and implementing whatever controls they want).
It might be interesting to add some sort of built in (extensible) mechanism into DataFusion to make this process easier.
Describe alternatives you've considered
I think we should wait for someone with a real usecase / implementation on top of LogicalPlan that we can upstream once we work out the details rather than designing this in advance but I wanted to file the ticket to track the idea
Additional context
No response
cc @waynexia @liukun4515 , how do you think about it ?
@alamb do you means datafusion add the component to support ACL? I think it will make datafusion more complex, and the positioning of datafusion is not DBMS.
Adding a complete implementation of access control in DataFusion might be hard. But it looks viable to me to add some basic components, to make it easier to build customized ACL on top of DataFusion.
Currently in our project, this functionality is implemented by matching the LogicalPlan and the SQL AST before executing it, just like @alamb mentioned above. https://github.com/GreptimeTeam/greptimedb/blob/9ff7670adfb56a80fe6ffeab8bdab9bcfe55543c/src/servers/src/interceptor.rs#L52-L60
For cases I can come up with, these hooks should be enough to accomplish ACL requirements, as the query and plan contain all the necessary information in theory. We can consider evolving DataFusion's hooks from that. Like replacing QueryContext with TaskContext orSessionContext. And maybe add an extra ACL-related field in that context.
@alamb do you means datafusion add the component to support ACL?
@liukun4515 I was thinking more like what @waynexia mentions -- datafusion would have some hooks that are extensible (aka based on a Trait) and have a simple default implementation (perhaps a noop) built into DataFusion. I am still not sure how useful such a feature would be
Hey folks. We've somehow made our way to this issue, so I can chime in here:
We're interested in using Postgres RLS around our DataFusion integration. I don't believe it's DataFusion's place to handle that, but some hooks to help us connect Postgres's permission/access control features to DataFusion would be super useful to us. Happy to discuss here or in Discord
@philippemnoel if you have any results that come from your discussions, it would be most helpful if you can post them (or a link to them) on this ticket for anyone in the future who might also be interested