[TECH DEBT]: Merge functionality for running ucx accross workspaces
Is there an existing issue for this?
- [X] I have searched the existing issues
Problem statement
Currently we are adding and have added functionality to run ucx accross workspaces, see this issue for an overview. The functionality to do so became somewhat dispersed and repeated:
- The account installer, specifically its `get_workspace_contexts
- The account workspaces
- The cli
- The AccountAggregate reuses the account workspaces
- The AccountMetastores gets all workspaces
Proposed Solution
TBD. First suggestion below
Context
We started with an account installer for installing ucx across workspaces and a sync workspace info command to have the workspace name available in all workspaces (as account admin is required to fetch this information). Now, we have and plan for more commands to run across a collection of workspaces.
Preferably we minimize the need for account admin, though, if you run a command over a collection of workspace you require to be admin to each of those workspaces.
Approach
- Move the logic for retrieving multiple workspace clients and contexts to the
AccountWorkspacesclass. - Then we can access those via the
AccountContextthat has theAccountWorkspacesas (cached) attribute - We access it in the account intstaller or any of the cli commands
Additional Context
No response
Note that we also use AccountWorkspaces in the AccountAggregate class
Furthermore, for the cli command, the AccountClient is initialized via a new code path when introducing the get_contexts. Before we initialized it (implicitly) by setting the is_account flag to True in the cli command decorator. Now, we also initialize it in the get_contexts.
To merge the code paths, we might require a change to blueprint, for example
- Make the
is_accounta flag optionally set by the user:databricks labs <project> <command> --on-account True - Or, (try to) set the
WorkspaceClientalso when theis_accountflag isTrue. If we can initiliaze theWorkspaceClientwe set it toNone
That would allow us to try to use the AccountClient to get a list of workpaces (contexts/clients), otherwise fallback on the WorkspaceClient provided by the blueprint command
We initially submitted a PR in blueprint to enable collection, where we set is_collection flag, so that if is_collection is set, then blueprint will set up both accountclient and workspaceclient. We agreed for now to do this implementation only in UCX https://github.com/databrickslabs/blueprint/pull/113
for cmd which are run as collection the user need to be both account admin and workspace admin of all the collection workspaces. I agree that we should move the collection logic from account installer to account workspaces and account installer should only contain options to install ucx for whole account.
I tested the follow integration test:
# tests/integration/install/test_installation.py
def test_account_installer_returns_workspace_contexts(env_or_skip, installation_ctx):
prompts = MockPrompts(
{
r"Please provide the Databricks account id.*": env_or_skip("DATABRICKS_ACCOUNT_ID"),
}
)
account_installer = installation_ctx.account_installer.replace(prompts=prompts)
installation_ctx.workspace_installation.run()
workspace_contexts = account_installer.get_workspace_contexts(
installation_ctx.workspace_client,
run_as_collection=True,
)
assert len(workspace_contexts) > 0
It fails when getting the installed workspace ids installer.config.installed_workspace_ids as the current setup tries to find an installed ucx version under the Applications folder instead of the ad-hoc installation used in integration tests
Linked a create-uber-principal related PR to log that we might want to create the uber principal at account level so that we have one principal for all workspaces