WAM broker failing in latest 2.63 release for at least ROPC flow: Error code: 3400073293
Describe the bug
Azure CLI latest version 2.63 seems to have enabled WAM broker feature by default on windows, and it fails ROPC flow in private cloud (at least) with interaction required error.
Additionally, when the feature is disabled, the user realm lookup code in MSAL.py fails to honor the authority port, and defaults to 443 instead of building the URI correctly and only adjusting path.
msal.authority: Initializing with Entra authority: https://localhost:3001/redacted ... HTTPSConnectionPool(host='localhost', port=443): Max retries exceeded with url: /common/userrealm/redacted?api-version=1.0
Related command
az login -u '[Redacted]' -p [Redacted] -t [Redacted]
Errors
ERROR: cli.azure.cli.core.azclierror: (pii). Status: Response_Status.Status_InteractionRequired, Error code: 3400073293, Tag: 527291998
Issue script & Debug output
az login -u '[Redacted]' -p [Redacted] -t [Redacted] --debug
2024-08-12 15:27:45.9590 [INFO] CLI debug: az : DEBUG: cli.knack.cli: Command arguments: ['login', '-u', '[REDACTED]', '-p', '[REDACTED]', '-t', '[REDACTED]', '--debug'] At line:1 char:1
- az login -u '[REDACTED' -p [REDACTED ...
-
+ CategoryInfo : NotSpecified: (DEBUG: cli.knac...3f', '--debug']:String) [], RemoteException + FullyQualifiedErrorId : NativeCommandError
DEBUG: cli.knack.cli: init debug log:
Cannot enable color.
DEBUG: cli.knack.cli: Event: Cli.PreExecute []
DEBUG: cli.knack.cli: Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at
0x00000184ED9DF7E0>, <function OutputProducer.on_global_arguments at 0x00000184EDB5E020>, <function CLIQuery.on_global_arguments at 0x00000184EDB87BA0>]
DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreCommandTableCreate []
DEBUG: cli.azure.cli.core: Modules found from index for 'login': ['azure.cli.command_modules.profile']
DEBUG: cli.azure.cli.core: Loading command modules:
DEBUG: cli.azure.cli.core: Name Load Time Groups Commands
DEBUG: cli.azure.cli.core: profile 0.004 2 8
DEBUG: cli.azure.cli.core: Total (1) 0.004 2 8
DEBUG: cli.azure.cli.core: These extensions are not installed and will be skipped: ['azext_ai_examples', 'azext_next']
DEBUG: cli.azure.cli.core: Loading extensions:
DEBUG: cli.azure.cli.core: Name Load Time Groups Commands Directory
DEBUG: cli.azure.cli.core: Total (0) 0.000 0 0
DEBUG: cli.azure.cli.core: Loaded 2 groups, 8 commands.
DEBUG: cli.azure.cli.core: Found a match in the command table.
DEBUG: cli.azure.cli.core: Raw command : login
DEBUG: cli.azure.cli.core: Command table: login
DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x00000184EFDBCD60>]
DEBUG: cli.azure.cli.core.azlogging: metadata file logging enabled - writing logs to 'C:\azureCli\commands\2024-08-12.15-27-45.login.4628.log'.
INFO: az_command_data_logger: command args: login -u {} -p {} -t {} --debug
DEBUG: cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument.
ERROR: cli.azure.cli.core.azclierror: (pii). Status: Response_Status.Status_InteractionRequired, Error code: 3400073293, Tag: 527291998 ERROR: az_command_data_logger: (pii). Status: Response_Status.Status_InteractionRequired, Error code: 3400073293, Tag: 527291998 Please explicitly log in with: az login DEBUG: cli.knack.cli: Event: Cli.PostExecute [<function AzCliLogging.deinit_cmd_metadata_logging at 0x00000184EFDBCFE0>] INFO: az_command_data_logger: exit code: 1 INFO: cli.main: Command ran in 1.029 seconds (init: 0.324, invoke: 0.705) INFO: telemetry.main: Begin splitting cli events and extra events, total events: 1 INFO: telemetry.client: Accumulated 0 events. Flush the clients. INFO: telemetry.main: Finish splitting cli events and extra events, cli events: 1 INFO: telemetry.save: Save telemetry record of length 4173 in cache INFO: telemetry.main: Begin creating telemetry upload process. INFO: telemetry.process: Creating upload process: "C:\Program Files\Microsoft SDKs\Azure\CLI2\python.exe C:\Program Files\Microsoft SDKs\Azure\CLI2\Lib\site-packages\azure\cli\telemetry_init_.pyc C:\azureCli" INFO: telemetry.process: Return from creating process INFO: telemetry.main: Finish creating telemetry upload process.
Expected behavior
WAM should probably not be enabled by default, and perhaps there should be some fallback to regular ROPC flow? If someone is using the ROPC, maybe just let the naive rest call go through to the STS, why is WAM even being used here?
Environment Summary
C:> az --version azure-cli 2.63.0
core 2.63.0 telemetry 1.1.0
Extensions: azure-devops 1.0.1
Dependencies: msal 1.30.0 azure-mgmt-resource 23.1.1
Additional context
Temporarily workaround (except for the port issue) using this:
az config set core.enable_broker_on_windows=false
Thank you for opening this issue, we will look into it.
CC @rayluo as I think error is in msal.py broker logic itself, along with the wrong port used in user realm lookup
[when broker is enabled (by default)], it fails ROPC flow in private cloud (at least) with interaction required error
Additionally, when the feature is disabled, the user realm lookup code in MSAL.py fails to honor the authority port, and defaults to 443 instead of building the URI correctly and only adjusting path
@keystroke, did that "in private cloud" scenario ever work before with older versions of Azure CLI and/or MSAL.py? If not, this would become a new feature request. CC @ashok672
FYI: You may want to provide a repro using MSAL.py only (such as python -c "import msal; msal.PublicClientApplication('azure_cli_client_id', authority='...', enable_broker_on_windows=True or False).acquire_token_interactive(...)), it may help MSAL team to investigate.
@keystroke, did you registered this cloud with az cloud register? What does endpoint you provided to --endpoint-active-directory look like?
@jiasli yes this is a custom cloud, the endpoint looks like "https://somewhere.com/" and then tenantId is provided and results in authority of https://somewhere.com/tenantId which is correct. But as noticed from stacktrace it gets broken somewhere inside msal.py trying to use this broker thing. I would expect that a user doesn't need to "set" the broker manually, and rather that whether it's used or not it can fallback to just doing a regular ROPC flow since the user / pass were provided here.
@rayluo the stacktrace and logs above show the msal.py context; as for the user realm lookup error, this occurs when I set the broker use to false and am using a non-443 port (notice authority is given with 3001 port but when it builds URI for user realm lookup it resets back to 443):
C:> az config set core.enable_broker_on_windows=false Command group 'config' is experimental and under development. Reference and support levels: https://aka.ms/CLI_refstatus C:> az login -u [email protected] -p [REDACTED] -t 98b8267d-e97f-426e-8b3f-7956511fd63f --debug cli.knack.cli: Command arguments: ['login', '-u', '[email protected]', '-p', '[REDACTED]', '-t', '98b8267d-e97f-426e-8b3f-7956511fd63f', '--debug'] cli.knack.cli: init debug log: Enable color in terminal. Enable VT mode. cli.knack.cli: Event: Cli.PreExecute [] cli.knack.cli: Event: CommandParser.OnGlobalArgumentsCreate [<function CLILogging.on_global_arguments at 0x0000020E7B24B7E0>, <function OutputProducer.on_global_arguments at 0x0000020E7B3D6020>, <function CLIQuery.on_global_arguments at 0x0000020E7B403BA0>] cli.knack.cli: Event: CommandInvoker.OnPreCommandTableCreate [] cli.azure.cli.core: Modules found from index for 'login': ['azure.cli.command_modules.profile'] cli.azure.cli.core: Loading command modules: cli.azure.cli.core: Name Load Time Groups Commands cli.azure.cli.core: profile 0.010 2 8 cli.azure.cli.core: Total (1) 0.010 2 8 cli.azure.cli.core: These extensions are not installed and will be skipped: ['azext_ai_examples', 'azext_next'] cli.azure.cli.core: Loading extensions: cli.azure.cli.core: Name Load Time Groups Commands Directory cli.azure.cli.core: Total (0) 0.000 0 0 cli.azure.cli.core: Loaded 2 groups, 8 commands. cli.azure.cli.core: Found a match in the command table. cli.azure.cli.core: Raw command : login cli.azure.cli.core: Command table: login cli.knack.cli: Event: CommandInvoker.OnPreCommandTableTruncate [<function AzCliLogging.init_command_file_logging at 0x0000020E7D644D60>] cli.azure.cli.core.azlogging: metadata file logging enabled - writing logs to 'C:\Users\bryanr.azure\commands\2024-08-12.15-36-27.login.29144.log'. az_command_data_logger: command args: login -u {} -p {} -t {} --debug cli.knack.cli: Event: CommandInvoker.OnPreArgumentLoad [<function register_global_subscription_argument.
.add_subscription_parameter at 0x0000020E7D6776A0>] cli.knack.cli: Event: CommandInvoker.OnPostArgumentLoad [] cli.knack.cli: Event: CommandInvoker.OnPostCommandTableCreate [<function register_ids_argument. .add_ids_arguments at 0x0000020E7D6FD620>, <function register_cache_arguments. .add_cache_arguments at 0x0000020E7D6FD760>] cli.knack.cli: Event: CommandInvoker.OnCommandTableLoaded [] cli.knack.cli: Event: CommandInvoker.OnPreParseArgs [] cli.knack.cli: Event: CommandInvoker.OnPostParseArgs [<function OutputProducer.handle_output_argument at 0x0000020E7B3D60C0>, <function CLIQuery.handle_query_parameter at 0x0000020E7B403C40>, <function register_ids_argument. .parse_ids_arguments at 0x0000020E7D6FD6C0>] cli.azure.cli.command_modules.profile.custom: Authentication with username and password in the command line is strongly discouraged. Use one of the recommended authentication methods based on your requirements. For more details, see https://go.microsoft.com/fwlink/?linkid=2276314 cli.azure.cli.core.auth.persistence: build_persistence: location='C:\Users\bryanr\.azure\msal_token_cache.bin', encrypt=True cli.azure.cli.core.auth.binary_cache: load: C:\Users\bryanr.azure\msal_http_cache.bin urllib3.util.retry: Converted retries value: 1 -> Retry(total=1, connect=None, read=None, redirect=None, status=None) msal.authority: Initializing with Entra authority: https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f msal.authority: openid_config("https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/v2.0/.well-known/openid-configuration") = {'authorization_endpoint': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/oauth2/v2.0/authorize', 'device_authorization_endpoint': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/oauth2/v2.0/devicecode', 'token_endpoint': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/oauth2/v2.0/token', 'token_endpoint_auth_methods_supported': ['client_secret_post', 'private_key_jwt', 'client_secret_basic'], 'jwks_uri': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/discovery/v2.0/keys', 'response_modes_supported': ['query', 'fragment', 'form_post'], 'response_types_supported': ['code', 'id_token', 'token'], 'scopes_supported': ['openid', 'profile', 'email'], 'issuer': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/v2.0'} msal.application: Broker enabled? None msal.telemetry: Generate or reuse correlation_id: 2c00d7b7-3b4f-491b-99f7-73539343d81b urllib3.connectionpool: Starting new HTTPS connection (1): localhost:443 urllib3.util.retry: Incremented Retry for (url='/common/userrealm/[email protected]?api-version=1.0'): Retry(total=0, connect=None, read=None, redirect=None, status=None) urllib3.connectionpool: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000020E7E702C90>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it')': /common/userrealm/[email protected]?api-version=1.0 urllib3.connectionpool: Starting new HTTPS connection (2): localhost:443 cli.azure.cli.core.azclierror: Traceback (most recent call last): File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/connection.py", line 174, in _new_conn File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/util/connection.py", line 95, in create_connection File "D:\a_work\1\s\build_scripts\windows\artifacts\cli\Lib\site-packages\urllib3/util/connection.py", line 85, in create_connection ConnectionRefusedError: [WinError 10061] No connection could be made because the target machine actively refused it
@rayluo yes ROPC flow has been working for us, it broke only in latest azure cli extension due to change which enables this windows broker thing, and this fails in msal layer before ever making any calls to the private cloud STS it seems... not sure what the intention is with this windows broker thing, what is it expected to do here.
Here is the relevant MSAL log section for the WAM issue:
DEBUG: msal.application: Broker enabled? True DEBUG: msal.broker: [MSAL:0001] WARNING SetAuthorityUri:78 Initializing authority from URI 'https://redactedendpoint/redactedTenantId' without authority type, defaulting to MsSts DEBUG: msal.broker: [MSAL:0001] INFO SetCorrelationId:258 Set correlation ID: 0d2585a7-9c69-4e51-bdb5-500f40ef73d6 DEBUG: msal.broker: [MSAL:0001] INFO EnqueueBackgroundRequest:1000 The original authority is 'https://redactedendpoint/redactedTenantId' DEBUG: msal.broker: [MSAL:0001] INFO ModifyAndValidateAuthParameters:219 Additional query parameter added successfully. Key: '(pii)' Value: '(pii)' DEBUG: msal.broker: [MSAL:0001] INFO ModifyAndValidateAuthParameters:243 Authority Realm: [redactedTenantId] DEBUG: msal.broker: [MSAL:0001] WARNING TryEnqueueMsaDeviceCredentialAcquisitionAndContinue:1052 MsaDeviceOperationProvider is not available. Not attempting to register the device. DEBUG: msal.broker: [MSAL:0002] ERROR ErrorInternalImpl:134 Created an error: 513z4, StatusInternal::InteractionRequired, InternalEvent::None, Error Code 3400073293, Context '(pii)'
And here for the user realm lookup:
msal.authority: Initializing with Entra authority: https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f msal.authority: openid_config("https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/v2.0/.well-known/openid-configuration") = {'authorization_endpoint': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/oauth2/v2.0/authorize', 'device_authorization_endpoint': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/oauth2/v2.0/devicecode', 'token_endpoint': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/oauth2/v2.0/token', 'token_endpoint_auth_methods_supported': ['client_secret_post', 'private_key_jwt', 'client_secret_basic'], 'jwks_uri': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/discovery/v2.0/keys', 'response_modes_supported': ['query', 'fragment', 'form_post'], 'response_types_supported': ['code', 'id_token', 'token'], 'scopes_supported': ['openid', 'profile', 'email'], 'issuer': 'https://localhost:3001/98b8267d-e97f-426e-8b3f-7956511fd63f/v2.0'} msal.application: Broker enabled? None msal.telemetry: Generate or reuse correlation_id: 2c00d7b7-3b4f-491b-99f7-73539343d81b urllib3.connectionpool: Starting new HTTPS connection (1): localhost:443
This change also broke aztfexport
the endpoint looks like "https://somewhere.com/"
notice authority is given with 3001 port
I am confused. You didn't provide the authority with 3001 port. How is it included in the authority? Is it https://somewhere.com:3001/?
it can fallback to just doing a regular ROPC flow since the user / pass were provided here
I also assume WAM is not used for ROPC flow, per https://github.com/AzureAD/microsoft-authentication-library-for-python/issues/617#issuecomment-1793092097.
@jiasli when the cloud environment is configured "for real" it's an endpoint that uses port 443. The example shown above using port 3001 is in a local test, which revealed the issue with the code in MSAL for building that URI not honoring the configured authority port correctly (this is a common bug I've seen in general in uri mgmt). The reason the user realm lookup is related is because MSAL performs a user realm API call before proceeding with the ROPC flow, to "inspect" whether the target account is known to the STS before sending the user's password. This is a minor issue somewhat, but cited it here since I observed it while debugging this as it is a part of the e2e flow.
@jiasli from latest release of Azure CLI, it changed the broker to be enabled by default, and it is active somehow during ROPC flow, so perhaps you need an updated drop of MSAL.py, or perhaps MSAL.py needs a fix, either way the latest Azure CLI release has this issue.
This change also broke aztfexport
Can you provide a few more details?
@keystroke, just to confirm, you are using https://localhost:3001/ as --endpoint-active-directory, right?
My understanding is that if your account requires MFA, it is expected that ROPC flow will fail with Status_InteractionRequired:
> az login --username [email protected] --password test
Authentication with username and password in the command line is strongly discouraged. Use one of the recommended authentication methods based on your requirements. For more details, see https://go.microsoft.com/fwlink/?linkid=2276314
(pii). Status: Response_Status.Status_InteractionRequired, Error code: 3400073293, Tag: 508609292
@jiasli In the actual private cloud case, the authority URI used is something like "https://privatecloud.com/3652955a-1a5b-4775-b25a-46e08f51a93a". When this broker is not enabled, the flow works fine, and the account [email protected] can sign-in using username + password.
When the broker is enabled, which Azure CLI changed to be TRUE by default in the latest release, this flow breaks, and it looks like it's because either MSAL or the Azure CLI wrapper code provides the wrong context to WAM (or spuriously invokes it where it should be skipped).
Take a look at this log statement from WAM (in my original post) which I've formatted:
DEBUG: msal.broker: [MSAL:0002] INFO LogTelemetryData:430 Key: wam_telemetry, Value: { "ui_visible": false, "scope": "https://redactedendpoint/.default offline_access openid profile", "redirect_uri": "ms-appx-web://Microsoft.AAD.BrokerPlugin/04b07795-8ddb-461a-bbee-02f9e1bf7b46", "provider_id": "https://login.windows.net", "oauth_error_code": "authentication_failed", "http_status": 404, "http_event_count": 1, "device_join": "not_joined", "correlation_id": "{28d28220-b29d-4b14-b5ae-01f1893cf46f}", "client_id": "04b07795-8ddb-461a-bbee-02f9e1bf7b46", "cache_event_count": 0, "broker_version": "10.0.20348.2520", "authority": "https://redactedEndpoint/redactedTenantId", "api_error_code": -894894003, "account_join_on_start": "not_joined", "account_join_on_end": "not_join]ed", "silent_code": 0, "silent_bi_sub_code": 0, "silent_message": "", "silent_status": 0, "is_cached": 1 }
I notice above, that "provider_id" is set to "https://login.windows.net" which leads me to believe that this MSAL-induced interaction with WAM is querying Azure about the local account, rather than "understanding" that the given authority URL is not "Microsoft Azure" and the interaction should either be skipped entirely, or it needs to consult the correct API on the correct authority endpoint.
Take a look at this other statement from the log:
DEBUG: msal.application: Broker enabled? True DEBUG: msal.broker: [MSAL:0001] WARNING SetAuthorityUri:78 Initializing authority from URI 'https://redactedendpoint/redactedTenantId' without authority type, defaulting to MsSts
It seems "authority type" was not provided here, and this "default to MsSts" seems like it's assuming an Azure cloud context.
@rayluo when MSAL interacts with WAM can any sort of authority info provided? Or does WAM always assume Azure cloud context and phone-home? Is this an issue with Azure CLI not invoking MSAL functionality correctly, or an oversight in the MSAL broker interaction?
@jiasli We might consider reverting this change in Azure CLI to keep WAM disabled by default until it can be properly rolled-out, with the necessary fixes from either Azure CLI in its interaction with MSAL, or with fixes from MSAL itself if required.
@keystroke , can you check your Windows settings, whether that account is already logged in?
I have seen some cases that if an account is logged in Windows for a long time, ROPC via broker may fail because in the Windows setting that account is actually invalid (the same account was added to many devices and reach the maximum number).
If that is the case, remove the account in Windows setting could resolve the problem.
@fengga the account in question here is not a windows account at all, it's just an arbitrary username and password that exists only in the context of this external, private cloud identity system.
Additionally, when the feature is disabled, the user realm lookup code in MSAL.py fails to honor the authority port, and defaults to 443 instead of building the URI correctly and only adjusting path
@keystroke, did that "in private cloud" scenario ever work before with older versions of Azure CLI and/or MSAL.py? If not, this would become a new feature request.
@rayluo yes ROPC flow has been working for us [sic]
@jiasli when the cloud environment is configured "for real" it's an endpoint that uses port 443. The example shown above using port 3001 is in a local test, which revealed the issue with the code in MSAL for building that URI not honoring the configured authority port correctly (this is a common bug I've seen in general in uri mgmt).
So, @keystroke , I think you mixed up different usage in this one report. If I understand it correctly,
-
"ROPC has been working" ... when WAM was not enabled, and "when the cloud environment is configured for real it's an endpoint that uses port 443"
- Note that, even in this case, I would say that the private cloud was not a design goal when MSAL was developed 6 or 7 years ago. My guess is that the private cloud happens to mimic most of the Entra (formerly known as AAD) behavior, so that MSAL happens to work.
-
"Additionally, when the feature is disabled, the user realm lookup code in MSAL.py fails to honor the authority port"
- Did ROPC with non-443 port ever work before? Probably not? So, it is not a regression.
- Not honoring the configured port was not necessarily a bug. Note that MSAL was designed with a mindset that authority was supposed to be validated; we did not envision accepting arbitrary authority with arbitrary port.
Please let us know whether it is a legit need to support an arbitrary authority (i.e. private cloud?) with arbitrary port.
@rayluo I just wanted to report the URI builder bug I noticed here since it's a part of the e2e flow (same code bug has been fixed in MSAL.net and even back in ADAL; this is a common bug that devs make when building URIs from a baseURI).
That said, the primary focus of this issue should be around the WAM integration, because we have something in MSAL.py blowing-up here, and either AzureCLI needs to modify something when calling into MSAL.py, or MSAL.py needs a fix here to call into WAM differently / dynamically.