ucx icon indicating copy to clipboard operation
ucx copied to clipboard

[Feature]: Upgrade DBSQL Warehouses

Open nfx opened this issue 2 years ago • 7 comments

Upstream dependencies:

  • https://github.com/databrickslabs/ucx/issues/670

TODO:

  • [ ] Set the default CATALOG to the new workspace catalog for DBSQL
  • [ ] Update all the warehouses to enable Unity Catalog
  • [ ] Remove the Instance Profile from the Warehouse if possible
  • [ ] [optional] Enable Serverless

Rollback:

  • [ ] Either modify queries to use 3 level namespace or revert Warehouse to disable UC

Should be a separate command line command

nfx avatar Aug 23 '23 09:08 nfx

So I see a small problem here.

If we allow a customer to modify the default database/catalog when sync'ing tables (which we should), this will not be a useful step as queries will break if we set a single catalog default...

We will have to combine the upgrade of DBSQL Warehouses with some code analysis tooling to upgrade the SQL Queries.

Currently this can works on notebook cells, but we can expand it to work on queries as well!

zpappa avatar Sep 28 '23 00:09 zpappa

@zpappa please create fine-grained issues for "code migration" from problem perspective, not UI. Focus on full headless automation and not a UI

nfx avatar Sep 28 '23 01:09 nfx

The problem with headless things is that they lack intelligence clearly. If the headless automation leaves things in a broken state, it provides no value.

The only way this is valuable is if catalog migration was 1:1 from hive->workspace catalog. If we assume that, we should ask if is this worth even solving.

zpappa avatar Oct 09 '23 14:10 zpappa

OK, I started to look into this... first issue is that the REST API is NOT supporting the toggling of Unity Catalog support. Same is true then by extension for the Databricks SDK, but that is to be expected. The UI is using a field that the REST API is not supporting (yet, I presume).

larsgeorge-db avatar Oct 13 '23 08:10 larsgeorge-db

@zpappa do you have the issue created for the "code migration"? I have customers asking helps to change the 2 level namespace to 3 level namespace in their queries and view definitions. I saw some regex based match and replace solutions, but I feel it's hard to cover all the cases. I'm thinking a query profile log based solution to detect all tables referenced in the query and replace them with the corresponding UC tables.

qziyuan avatar Jan 10 '24 23:01 qziyuan

If implementing this issue, consider adding a tag as mentioned in #2073

JCZuurmond avatar Jul 17 '24 15:07 JCZuurmond

I've been doing some work on this, and to consolidate the discussion above somewhat:

  • The Databricks SDK is being updated to support the disable_uc property needed to toggle UC support on and off for a SQL Warehouse. (PR #2230 demonstrates that the property is present and works, but we don't want to maintain such a workaround.)
  • SQL Warehouses don't have their own default catalog configuration: they use the default catalog that has been assigned to the workspace using metastore assignment. As such this can be changed but it's an all-or-nothing situation and affects the entire workspace (including UCX itself, as #2207 hints at.) The default catalog for upgraded workspaces is (for compatibility) hive_metastore.

I've just created #2231 to cover changing the default catalog for a workspace.

asnare avatar Jul 23 '24 13:07 asnare