[SIP-95] Proposal for Catalog Support in SQL Lab
[SIP] Proposal for Catalog Support in SQL Lab
Motivation
The current SQL Lab explorer is not very intuitive as there is no way to explore catalogs, and can only explore the default catalog.
Proposed Change
Add a new Catalog dropdown in the sqllab editor (similar to schema).
New or Changed Public Interfaces
This should only affect the sqllab query tab. The Catalog dropdown will be similar enough to the Schema explorer. The word schema is kind of overloaded throughout Superset, so it's not entirely clear to me yet whether a new endpoint is necessary.
New dependencies
None
Migration Plan and Compatibility
None
Rejected Alternatives
None
I would love to see this, but there are few complications. The biggest one is that we need support from the SQL Alchemy dialects, and I'm not sure how many support dynamic catalogs. For example, both Hive and Trino require the catalog to be specified when the connection is created:
-
trino://<username>:<password>@<host>:<port>/<catalog>/<schema> -
hive://<host>:<port>/<catalog>
For engines like this the user would have to choose a default catalog when the database is added, and each DB engine spec in Superset would have to know how to replace the catalog in the URL, in case a user chooses one different from the default after the DB has been created.
It's definitely doable, and something we should do. We've been slowly adding support for catalogs:
https://github.com/apache/superset/blob/383313b105b0e82bea0f38cc971630eded5affe0/superset/sql_parse.py#L172
Are you planning to work on this, or are you just proposing it? If you're not going to work on it I can start taking a look.
I was planning to work on this, but of course I’d love help as I’ve never made a contribution to Superset. I was planning on doing a smaller task first (adding a configurable header/footer) since I figured this one might be complex.
I was planning to work on this, but of course I’d love help as I’ve never made a contribution to Superset. I was planning on doing a smaller task first (adding a configurable header/footer) since I figured this one might be complex.
Ah, great! From the wording of the SIP I wasn't sure, and people are welcome to propose changes even if they're not going to work on it. I'm happy to help in any way you need, let me know!
hey @betodealmeida, good to see you are still pushing Superset forward!
We worked around originally by concatenating catalog with schema, but patching got nasty, as we would pass it thought to Trino driver.
Here is how it looks:

I think a good approach for this would be modifying BaseEngineSpec.get_engine to allow passing a catalog. The method would then pass it to get_sqla_engine_with_context, which would call a DB engine specific method to mutate the SQLAlchemy URI so it uses the requested catalog. Something like (pseudocode):
BaseEngineSpec:
@classmethod
def get_engine(
cls,
database: Database,
catalog: Optional[str] = None,
schema: Optional[str] = None,
source: Optional[QuerySource] = None,
) -> ContextManager[Engine]:
# get_sqla_engine_with_context needs to call database.engine_spec.apply_catalog_to_sqlalchemy_uri
return database.get_sqla_engine_with_context(
catalog=catalog,
schema=schema,
source=source
)
@classmethod
def apply_catalog_to_sqlalchemy_uri(cls, database: Database, catalog: str) -> URL:
return database.sqlalchemy_uri
@classmethod
def get_catalog_names(
cls,
database: Database,
inspect: Inspector,
) -> Set[str]:
return set()
Individual DB engine specs like Trino/Hive/Snowflake/BigQuery would implement custom apply_catalog_to_sqlalchemy_uri methods to build the SQLAlchemy URI for a given catalog. They would also implement custom get_catalog_names to list all available catalogs, since there's no standard SQLAlchemy way of doing that, as far as I know.
Actually, we would need the apply_catalog_to_sqlalchemy_uri to also pass schema:
@classmethod
def apply_catalog_to_sqlalchemy_uri(
cls,
database: Database,
catalog: Optional[str],
schema: Optional[str],
) -> URL:
try:
connect_args = database.get_extra()["engine_params"]["connect_args"]
except KeyError:
connect_args = {}
return database.sqlalchemy_uri, connect_args
For Postgres, this would look like this (untested):
@classmethod
def apply_catalog_to_sqlalchemy_uri(
cls,
database: Database,
catalog: Optional[str],
schema: Optional[str],
) -> Tuple[URL, Dict[str, Any]]:
url, connect_args = super().apply_catalog_to_sqlalchemy_uri(database, catalog, schema)
if catalog:
# for Postgres a catalog is a database
url = url.set(database=catalog)
if schema:
connect_args["options"] = f"-csearch_path={schema}"
return url, connect_args
See the discussion at https://github.com/apache/superset/pull/23356.
Actually we already have a method to change the schema (adjust_database_uri), we should piggyback on that.
While working on an issue with schemas, I've done some foundational work for supporting catalogs:
- https://github.com/apache/superset/pull/23356
- https://github.com/apache/superset/pull/23385
- https://github.com/apache/superset/pull/23401
Once we have that, I think we would still need the following steps:
- Implement catalog-level permissions (@dpgaspar can probably help here). We also need to modify the parser to extract the catalog from queries, and have the security manager handle catalogs when checking for access.
- Implement a
get_catalog_namesin the DB engine specs that support catalogs (Presto, Trino, Snowflake, BigQuery, Postgres, ...?). Also update theiradjust_engine_paramsmethods to modify the SQL Alchemy URI to use the chosen catalog. - Add a catalog dropdown to SQL Lab, and make sure the selected catalog is being passed to
_get_sqla_enginein the database model, and from there passed toadjust_engine_params. - Modify the dataset model and add a
catalogattribute. - Add a catalog dropdown to the dataset creator.
I'm going to work on (1) and (2) this week, then we can put the SIP to vote and you (or someone else) can work on the UI part, @JELGT2011.
that's awesome news @betodealmeida! I've been a little delayed trying to upgrade our existing instance of superset, but I could definitely do some frontend changes (3), (4?), and (5). I think I have a draft work in progress on it already so that works out.
Numbering this as SIP-95. If you'd like to continue pursuing this, pleas submit it for a DISCUSS thread on the Superset @dev mailing list, so we can start progressing it toward a vote. Thanks!
It seems like this work is incrementally moving forward, but the SIP itself has still not been brought up for a DISCUSS thread or a VOTE thread on the Apache mailing list. @JELGT2011 / @betodealmeida is there any need or intent to continue with the SIP process here?
It would be huge to get this going for Databricks. I could probably contribute - especially since part of the approach was laid out for different engines
I also support putting this SIP-95 up for a DISCUSS/VOTE thread! @MotherDuckdb has a similar scenario where a single DuckDB instance can connect to multiple databases (see https://duckdb.org/docs/sql/statements/attach.html). It would be great to be able to use the Catalog drop-down to scope the Schema options.
@guenp please feel free to kick off the Discussion thread on the [email protected] mailing list! Let me know if you need help with that, you can ping me here or on Superset Slack
Hi @JELGT2011 / @guenp if anyone wants to move this SIP forward, please put it up for a [DISCUSS] thread on the mailing list. Otherwise, it's at risk of being closed/discarded.
I'll bring it up for a discussion, I don't think we're far away from finishing this.
@betodealmeida regarding your comment in the "[DISCUSS] [SIP-95] Proposal for Catalog Support in SQL Lab" email,
I'd like to increase the scope of SIP-95 by introducing catalogs not only in SQL Lab, but throughout the whole application.
I totally agree as it likely is a necessary requirement if one wanted to explore a SQL Lab result set.
@john-bodley right, I just called that out because the original SIP mentioned only SQL Lab in the title, but I don't see how that would work. 🙃
@betodealmeida I think you just need to close the vote for this with a [RESULT] thread
@betodealmeida I think you just need to close the vote for this with a [RESULT] thread
Done!
It's official, we now have catalog support in Apache Superset. Currently Postgres and Databricks are supported, and I have a PR out for:
- Trino
- Presto
- BigQuery
- Snowflake
Adding support to new DB engine specs should be straightforward, requiring just 3 methods. I suggest using https://github.com/apache/superset/pull/28416 as a reference implementation.