[Feature]: Upgrade DBSQL Warehouses
Upstream dependencies:
- https://github.com/databrickslabs/ucx/issues/670
TODO:
- [ ] Set the default CATALOG to the new workspace catalog for DBSQL
- [ ] Update all the warehouses to enable Unity Catalog
- [ ] Remove the Instance Profile from the Warehouse if possible
- [ ] [optional] Enable Serverless
Rollback:
- [ ] Either modify queries to use 3 level namespace or revert Warehouse to disable UC
Should be a separate command line command
So I see a small problem here.
If we allow a customer to modify the default database/catalog when sync'ing tables (which we should), this will not be a useful step as queries will break if we set a single catalog default...
We will have to combine the upgrade of DBSQL Warehouses with some code analysis tooling to upgrade the SQL Queries.
Currently this can works on notebook cells, but we can expand it to work on queries as well!
@zpappa please create fine-grained issues for "code migration" from problem perspective, not UI. Focus on full headless automation and not a UI
The problem with headless things is that they lack intelligence clearly. If the headless automation leaves things in a broken state, it provides no value.
The only way this is valuable is if catalog migration was 1:1 from hive->workspace catalog. If we assume that, we should ask if is this worth even solving.
OK, I started to look into this... first issue is that the REST API is NOT supporting the toggling of Unity Catalog support. Same is true then by extension for the Databricks SDK, but that is to be expected. The UI is using a field that the REST API is not supporting (yet, I presume).
@zpappa do you have the issue created for the "code migration"? I have customers asking helps to change the 2 level namespace to 3 level namespace in their queries and view definitions. I saw some regex based match and replace solutions, but I feel it's hard to cover all the cases. I'm thinking a query profile log based solution to detect all tables referenced in the query and replace them with the corresponding UC tables.
If implementing this issue, consider adding a tag as mentioned in #2073
I've been doing some work on this, and to consolidate the discussion above somewhat:
- The Databricks SDK is being updated to support the
disable_ucproperty needed to toggle UC support on and off for a SQL Warehouse. (PR #2230 demonstrates that the property is present and works, but we don't want to maintain such a workaround.) - SQL Warehouses don't have their own default catalog configuration: they use the default catalog that has been assigned to the workspace using metastore assignment. As such this can be changed but it's an all-or-nothing situation and affects the entire workspace (including UCX itself, as #2207 hints at.) The default catalog for upgraded workspaces is (for compatibility)
hive_metastore.
I've just created #2231 to cover changing the default catalog for a workspace.