Duplicated tables and views

Open giovannipapini-agilelab opened this issue 2 years ago • 1 comments

https://github.com/databricks/databricks-sql-python/blob/3eaaac91a0a1a0e5920f64bb41f94e552f22497a/src/databricks/sqlalchemy/dialect/init.py#L254 lists tables just from "SHOW TABLES FROM {}" but it does not keep in consideration that atm Databricks lists views as well as tables with that. A possible correction could be just to add a filter in the iteration like [i[TABLE_NAME] for i in data if i[TABLE_NAME] not in get_view_names(self, connection, schema, **kwargs)]. I am not sure if the current behaviour is the correct one, but I find it annoying in the context of library usage on Apache Superset, in which table names are multiplicated by this issue, they are listed both as view and table.

Apr 26 '23 23:04 giovannipapini-agilelab

Thanks for this feedback. Agree that this isn't optimal. But I wonder if your optimisation idea makes more sense for the Superset adapter specifically?

My concern is that if we filter in the way you suggest, then we'll make two roundtrips (potentially long ones) to the compute. We could get around this by querying information_schema instead, for UC-enabled workspaces.

For example:

select
  *
from
  information_schema.tables
where
  table_type <> 'VIEW'
  AND table_schema = $schema

Which doesn't help the non-UC users. But our dialect is primarily meant for UC applications anyway.

Jul 11 '23 02:07 susodapop