Fleet resending WlanXml CSP unnecessarily is causing errors
Fleet version: 4.60 Web browser and operating system: Windows
💥 Actual behavior
Fleet is resending a CSP after it was previously successful, which is causing it to error out. The following snippet attempts to Add a new WiFi profile via CSP, but it errors when the WiFi profile is already present.
🧑💻 Steps to reproduce
- Add a CSP with an <Add> action (like the Wifi one provided in this ticket) to a team with Windows hosts.
- Confirm the CSP was successfully delivered to the host. Leave the CSP assigned to the team.
- After an unknown amount of time, Fleet will attempt to re-send the CSP. This will cause the setting to report as "Failed" because a Windows host will reject an <Add> action for a setting it already has.
🕯️ More info (optional)
- CSP was not re-uploaded. It was left it in custom settings like you would for a .mobileconfig
- We may need to detect state more thoroughly before sending the CSP
🛠️ To fix
@marko-lisica:
- Fix verification logic when the user uploads a Windows configuration profile that controls WlanXml option, so Fleet can verify that
WlanXmlOS setting is enforced.
Thanks for filing this, we'll take a look!
QA Notes:
Uploaded and deployed the csp to two windows hosts and both have been stuck in Verifying for over 48hrs...while a dif csp did complete successfully.
Possibly related to #23599 or how the CSP is written based on what Fleet supports
<!--Use the following snippet to Add a new WiFi profile via CSP. -->
<!-- 1) Do a find for YOURSSID and replace it with the SSID name -->
<!-- 2) Do a find for 594F555253534944 and replace it with the SSID name, converted to Hex -->
<!-- 3) Do a find for YOURPASSPHRASE and replace it with the passphrase -->
<!-- 4) Generate a unique ID for the cmdID -->
<Add>
<CmdID>f4550c6a-48ae-408c-ae9b-fb0c275de6b8</CmdID>
<Item>
<Meta>
<Format xmlns="syncml:metinf">chr</Format>
</Meta>
<Target>
<LocURI>./Vendor/MSFT/WiFi/Profile/YOURSSID/WlanXml</LocURI>
</Target>
<Data><?xml version="1.0" encoding="US-ASCII"?><WLANProfile xmlns="http://www.microsoft.com/networking/WLAN/profile/v1"><name>YOURSSID</name><SSIDConfig><SSID><hex>594F555253534944</hex><name>YOURSSID</name></SSID></SSIDConfig><connectionType>ESS</connectionType><connectionMode>auto</connectionMode><MSM><security><authEncryption><authentication>WPA2PSK</authentication><encryption>AES</encryption><useOneX>false</useOneX></authEncryption><sharedKey><keyType>passPhrase</keyType><protected>false</protected><keyMaterial>YOURPASSPHRASE</keyMaterial></sharedKey></security></MSM></WLANProfile></Data>
</Item>
</Add>
When we shipped Windows MDM, we did not plan, design, or QA the WlanXml profile, so this is a StoryBug (TM).
Sending this one to product to discuss feature coverage
@marko-lisica I assigned this bug to you. Is it a bug? Up to you. If it's a feature request can you please turn this issue into a feature request and let the Customer Success Manager for preston know (you can find who this is in Salesforce under "Account Owner")
@noahtalerman @georgekarrv I think Fleet resending Windows profile without user interaction is a bug and we should investigate that.
I see an additional problem if the profile contains <Add> CSP and it's verified. If the user resends that profile it will fail. Some CSPs require you to use <Add> first but each time you want to edit that CSP it must be <Replace> after it's added. This status 418 means that it already exists on the host and can be replaced. See more info here.
I think we should file a feature request to improve this in a way that Fleet checks if it already exists and automatically sends <Replace> instead of <Add> or disable the resend button. For now, we can document this behavior or add some copy to UI.
@getvictor Do you think this happened because Fleet wasn't able to verify with osquery that this profile is applied because we don't have the logic to check this data type?
@marko-lisica Yes, Fleet couldn't verify because we don't have special logic for comparing this type of profile.
This is similar to the ADMX-backed profile type which I recently added verification support for: https://learn.microsoft.com/en-us/windows/client-management/understanding-admx-backed-policies
Yes, Fleet couldn't verify because we don't have special logic for comparing this type of profile.
Thanks @getvictor! I think we should only solve the verification problem for WlanXml CSP as part of this bug.
I'm not sure if we can handle verification for all kinds of CSPs that require XML as <Data>, or we can solve only verification for WlanXml, but I found another CSP example that uses XML as <Data> and this section that describe how to handle XML configurations.
Yes, we should try to uncover other config scenarios and make sure we handle them. We don't have to fully verify each scenario, we just need to recognize it. For the configs where we don't support full verification, we can just mark them Verified as opposed to simply failing, which is what happens now.
Yes, we should try to uncover other config scenarios and make sure we handle them. We don't have to fully verify each scenario, we just need to recognize it. For the configs where we don't support full verification, we can just mark them Verified as opposed to simply failing, which is what happens now.
@getvictor just chatted with @marko-lisica and we decided to aim to fully verify each scenario. We'll address the scenarios in which we don't verify as one off fixes.
We want to do this so we learn/can debug when the verification isn't working as expected. If we just mark everything as verified, I think broken verification would go unnoticed.
@noahtalerman Should we prioritize this bug and give it P2 label?
@marko-lisica I personally don't think it's urgent (definition of P2) because, if I'm understanding correctly, the Wi-Fi profile gets successfully applied (but shows up as failed)
That said, I think we should get to it as soon as we can. It's a customer reported bug.
cc @georgekarrv @pintomi1989
@marko-lisica I personally don't think it's urgent (definition of
P2) because, if I'm understanding correctly, the Wi-Fi profile gets successfully applied (but shows up as failed)
Hi everyone, just to make sure :
Is having the profile as "failed" status making Fleet server re-send this profile at some time or not at all ? Just wondering cause it might cause the profile to blink on the device's end and maybe user can lose the wifi connection for a sec
Thanks !
Is having the profile as "failed" status making Fleet server re-send this profile at some time or not at all ? Just wondering cause it might cause the profile to blink on the device's end and maybe user can lose the wifi connection for a sec
I believe that when the profile is sent to a host, Fleet attempts to verify if it's applied twice. After that, I don't think we make any additional retries.
@getvictor could you please confirm if this is true?
Yes, if profile status is failed, Fleet will try sending it again only once.
Hey team! Please add your planning poker estimate with Zenhub @getvictor @gillespi314 @mna
@nonpunctual This issue was mentioned customer-person's call.
This issue only fixes the validation issue with the WlanXml profile. By validating it correctly, Fleet should not try to resend it.
The issue that resending an <Add> profile causes an error is in design (#26904). The earliest it will ship is 4.69. This means an <Add> profile in GitOps will likely fail if it gets modified.
The information above is mostly sufficient for testing this but adding a few notes below:
- This must be tested on real hardware as best I can tell. I could not get windows to install one of these policies on a Win11 VM at all.
- As a follow up to ^ you can test using profiles that don't match any network your hardware can connect to however some of the behavior I've seen such as windows "upgrading" a WPA2 profile to WPA3 only happens when there is a real network in range that matches the profile. So it is worth doing at least some testing with your real network
- When testing I recommend deleting the "hex" element under the SSID, similar to what you see here: https://learn.microsoft.com/en-us/windows/win32/nativewifi/non-broadcast-profile-sample . It's also worth trying the inverse - only providing the "hex" format and not the name and making sure that verifies as well.
- This tool is useful for generating profiles: https://daduckmsft.github.io/WiFiProfileGenerator/android.html
Unwanted resend stopped, Fleet in harmony now, Wifi sings, no drop.
CSP resent, errors fly, Streamlined logic, issues die. Fleet's path, evermore clear sky.
@JordanMontgomery @getvictor @PezHub It's known that the Wi-Fi profile can't be installed on a VM IF the VM is using the host computer's internal network. If you have a USB Wi-Fi NIC that presents as a completely separate hardware network, you can deploy Windows Wi-Fi profiles to a VM connected to it.
QA Test Results
- confirmed my Wifi profiles are getting deployed and Verified on my Windows NucBox_M5 device.
- confirmed it works with and without the Hex info
Sampe profile:
<Add>
<CmdID>2</CmdID>
<Item>
<Meta>
<Format xmlns="syncml:metinf">chr</Format>
</Meta>
<Target>
<LocURI>./Vendor/MSFT/WiFi/Profile/Fleet/WlanXml</LocURI>
</Target>
<Data><?xml version="1.0"?>
<WLANProfile xmlns="http://www.microsoft.com/networking/WLAN/profile/v1">
<name>Fleet</name>
<SSIDConfig>
<SSID>
<hex>466C656574</hex>
<name>Fleet</name>
</SSID>
<nonBroadcast>false</nonBroadcast>
</SSIDConfig>
<connectionType>ESS</connectionType>
<connectionMode>auto</connectionMode>
<MSM>
<security>
<authEncryption>
<authentication>WPA2PSK</authentication>
<encryption>AES</encryption>
<useOneX>false</useOneX>
</authEncryption>
<sharedKey>
<keyType>passPhrase</keyType>
<protected>false</protected>
<keyMaterial>fleetwifi</keyMaterial>
</sharedKey>
</security>
</MSM>
</WLANProfile></Data>
</Item>
</Add>
Wi-Fi profile fixed, Fleet's harmony restored. No errors exist.