fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Fleet resending WlanXml CSP unnecessarily is causing errors

Open rebeccaui opened this issue 1 year ago • 18 comments

Fleet version: 4.60 Web browser and operating system: Windows


💥  Actual behavior

Fleet is resending a CSP after it was previously successful, which is causing it to error out. The following snippet attempts to Add a new WiFi profile via CSP, but it errors when the WiFi profile is already present. image image

🧑‍💻  Steps to reproduce

  1. Add a CSP with an <Add> action (like the Wifi one provided in this ticket) to a team with Windows hosts.
  2. Confirm the CSP was successfully delivered to the host. Leave the CSP assigned to the team.
  3. After an unknown amount of time, Fleet will attempt to re-send the CSP. This will cause the setting to report as "Failed" because a Windows host will reject an <Add> action for a setting it already has.

🕯️ More info (optional)

  • CSP was not re-uploaded. It was left it in custom settings like you would for a .mobileconfig
  • We may need to detect state more thoroughly before sending the CSP

🛠️ To fix

@marko-lisica:

  • Fix verification logic when the user uploads a Windows configuration profile that controls WlanXml option, so Fleet can verify that WlanXml OS setting is enforced.

rebeccaui avatar Dec 04 '24 21:12 rebeccaui

Thanks for filing this, we'll take a look!

georgekarrv avatar Dec 05 '24 17:12 georgekarrv

QA Notes:

Uploaded and deployed the csp to two windows hosts and both have been stuck in Verifying for over 48hrs...while a dif csp did complete successfully. Screenshot 2024-12-10 at 3 15 59 PM Screenshot 2024-12-10 at 3 14 10 PM

Possibly related to #23599 or how the CSP is written based on what Fleet supports

PezHub avatar Dec 10 '24 23:12 PezHub

<!--Use the following snippet to Add a new WiFi profile via CSP. -->
<!-- 1) Do a find for YOURSSID and replace it with the SSID name -->
<!-- 2) Do a find for 594F555253534944 and replace it with the SSID name, converted to Hex -->
<!-- 3) Do a find for YOURPASSPHRASE and replace it with the passphrase -->
<!-- 4) Generate a unique ID for the cmdID -->


<Add>
 <CmdID>f4550c6a-48ae-408c-ae9b-fb0c275de6b8</CmdID>
  <Item>
    <Meta>
      <Format xmlns="syncml:metinf">chr</Format>
    </Meta>
    <Target>
      <LocURI>./Vendor/MSFT/WiFi/Profile/YOURSSID/WlanXml</LocURI>
    </Target>
    <Data>&lt;?xml version=&quot;1.0&quot; encoding=&quot;US-ASCII&quot;?&gt;&lt;WLANProfile xmlns=&quot;http://www.microsoft.com/networking/WLAN/profile/v1&quot;&gt;&lt;name&gt;YOURSSID&lt;/name&gt;&lt;SSIDConfig&gt;&lt;SSID&gt;&lt;hex&gt;594F555253534944&lt;/hex&gt;&lt;name&gt;YOURSSID&lt;/name&gt;&lt;/SSID&gt;&lt;/SSIDConfig&gt;&lt;connectionType&gt;ESS&lt;/connectionType&gt;&lt;connectionMode&gt;auto&lt;/connectionMode&gt;&lt;MSM&gt;&lt;security&gt;&lt;authEncryption&gt;&lt;authentication&gt;WPA2PSK&lt;/authentication&gt;&lt;encryption&gt;AES&lt;/encryption&gt;&lt;useOneX&gt;false&lt;/useOneX&gt;&lt;/authEncryption&gt;&lt;sharedKey&gt;&lt;keyType&gt;passPhrase&lt;/keyType&gt;&lt;protected&gt;false&lt;/protected&gt;&lt;keyMaterial&gt;YOURPASSPHRASE&lt;/keyMaterial&gt;&lt;/sharedKey&gt;&lt;/security&gt;&lt;/MSM&gt;&lt;/WLANProfile&gt;</Data>
  </Item>
</Add>

PezHub avatar Dec 10 '24 23:12 PezHub

When we shipped Windows MDM, we did not plan, design, or QA the WlanXml profile, so this is a StoryBug (TM).

getvictor avatar Jan 20 '25 13:01 getvictor

Sending this one to product to discuss feature coverage

georgekarrv avatar Jan 21 '25 17:01 georgekarrv

@marko-lisica I assigned this bug to you. Is it a bug? Up to you. If it's a feature request can you please turn this issue into a feature request and let the Customer Success Manager for preston know (you can find who this is in Salesforce under "Account Owner")

noahtalerman avatar Jan 22 '25 01:01 noahtalerman

@noahtalerman @georgekarrv I think Fleet resending Windows profile without user interaction is a bug and we should investigate that.

I see an additional problem if the profile contains <Add> CSP and it's verified. If the user resends that profile it will fail. Some CSPs require you to use <Add> first but each time you want to edit that CSP it must be <Replace> after it's added. This status 418 means that it already exists on the host and can be replaced. See more info here.

I think we should file a feature request to improve this in a way that Fleet checks if it already exists and automatically sends <Replace> instead of <Add> or disable the resend button. For now, we can document this behavior or add some copy to UI.

marko-lisica avatar Jan 23 '25 10:01 marko-lisica

@getvictor Do you think this happened because Fleet wasn't able to verify with osquery that this profile is applied because we don't have the logic to check this data type?

marko-lisica avatar Jan 28 '25 20:01 marko-lisica

@marko-lisica Yes, Fleet couldn't verify because we don't have special logic for comparing this type of profile.

This is similar to the ADMX-backed profile type which I recently added verification support for: https://learn.microsoft.com/en-us/windows/client-management/understanding-admx-backed-policies

getvictor avatar Jan 28 '25 21:01 getvictor

Yes, Fleet couldn't verify because we don't have special logic for comparing this type of profile.

Thanks @getvictor! I think we should only solve the verification problem for WlanXml CSP as part of this bug.

I'm not sure if we can handle verification for all kinds of CSPs that require XML as <Data>, or we can solve only verification for WlanXml, but I found another CSP example that uses XML as <Data> and this section that describe how to handle XML configurations.

marko-lisica avatar Jan 29 '25 13:01 marko-lisica

Yes, we should try to uncover other config scenarios and make sure we handle them. We don't have to fully verify each scenario, we just need to recognize it. For the configs where we don't support full verification, we can just mark them Verified as opposed to simply failing, which is what happens now.

getvictor avatar Jan 29 '25 13:01 getvictor

Yes, we should try to uncover other config scenarios and make sure we handle them. We don't have to fully verify each scenario, we just need to recognize it. For the configs where we don't support full verification, we can just mark them Verified as opposed to simply failing, which is what happens now.

@getvictor just chatted with @marko-lisica and we decided to aim to fully verify each scenario. We'll address the scenarios in which we don't verify as one off fixes.

We want to do this so we learn/can debug when the verification isn't working as expected. If we just mark everything as verified, I think broken verification would go unnoticed.

noahtalerman avatar Jan 29 '25 15:01 noahtalerman

@noahtalerman Should we prioritize this bug and give it P2 label?

marko-lisica avatar Jan 29 '25 19:01 marko-lisica

@marko-lisica I personally don't think it's urgent (definition of P2) because, if I'm understanding correctly, the Wi-Fi profile gets successfully applied (but shows up as failed)

That said, I think we should get to it as soon as we can. It's a customer reported bug.

cc @georgekarrv @pintomi1989

noahtalerman avatar Jan 30 '25 23:01 noahtalerman

@marko-lisica I personally don't think it's urgent (definition of P2) because, if I'm understanding correctly, the Wi-Fi profile gets successfully applied (but shows up as failed)

Hi everyone, just to make sure :

Is having the profile as "failed" status making Fleet server re-send this profile at some time or not at all ? Just wondering cause it might cause the profile to blink on the device's end and maybe user can lose the wifi connection for a sec

Thanks !

valentinpezon-primo avatar Jan 31 '25 08:01 valentinpezon-primo

Is having the profile as "failed" status making Fleet server re-send this profile at some time or not at all ? Just wondering cause it might cause the profile to blink on the device's end and maybe user can lose the wifi connection for a sec

I believe that when the profile is sent to a host, Fleet attempts to verify if it's applied twice. After that, I don't think we make any additional retries.

@getvictor could you please confirm if this is true?

marko-lisica avatar Jan 31 '25 09:01 marko-lisica

Yes, if profile status is failed, Fleet will try sending it again only once.

getvictor avatar Jan 31 '25 13:01 getvictor

Hey team! Please add your planning poker estimate with Zenhub @getvictor @gillespi314 @mna

georgekarrv avatar Mar 05 '25 17:03 georgekarrv

@nonpunctual This issue was mentioned customer-person's call.

This issue only fixes the validation issue with the WlanXml profile. By validating it correctly, Fleet should not try to resend it.

The issue that resending an <Add> profile causes an error is in design (#26904). The earliest it will ship is 4.69. This means an <Add> profile in GitOps will likely fail if it gets modified.

getvictor avatar Apr 12 '25 21:04 getvictor

The information above is mostly sufficient for testing this but adding a few notes below:

  1. This must be tested on real hardware as best I can tell. I could not get windows to install one of these policies on a Win11 VM at all.
  2. As a follow up to ^ you can test using profiles that don't match any network your hardware can connect to however some of the behavior I've seen such as windows "upgrading" a WPA2 profile to WPA3 only happens when there is a real network in range that matches the profile. So it is worth doing at least some testing with your real network
  3. When testing I recommend deleting the "hex" element under the SSID, similar to what you see here: https://learn.microsoft.com/en-us/windows/win32/nativewifi/non-broadcast-profile-sample . It's also worth trying the inverse - only providing the "hex" format and not the name and making sure that verifies as well.
  4. This tool is useful for generating profiles: https://daduckmsft.github.io/WiFiProfileGenerator/android.html

JordanMontgomery avatar Apr 16 '25 17:04 JordanMontgomery

Unwanted resend stopped, Fleet in harmony now, Wifi sings, no drop.

fleet-release avatar Apr 18 '25 12:04 fleet-release

CSP resent, errors fly, Streamlined logic, issues die. Fleet's path, evermore clear sky.

fleet-release avatar Apr 18 '25 12:04 fleet-release

@JordanMontgomery @getvictor @PezHub It's known that the Wi-Fi profile can't be installed on a VM IF the VM is using the host computer's internal network. If you have a USB Wi-Fi NIC that presents as a completely separate hardware network, you can deploy Windows Wi-Fi profiles to a VM connected to it.

nonpunctual avatar Apr 18 '25 13:04 nonpunctual

QA Test Results

  • confirmed my Wifi profiles are getting deployed and Verified on my Windows NucBox_M5 device.
  • confirmed it works with and without the Hex info

Image

Sampe profile:

<Add>
  <CmdID>2</CmdID>
  <Item>
    <Meta>
      <Format xmlns="syncml:metinf">chr</Format>
    </Meta>
    <Target>
      <LocURI>./Vendor/MSFT/WiFi/Profile/Fleet/WlanXml</LocURI>
    </Target>
    <Data>&lt;?xml version=&quot;1.0&quot;?&gt;
&lt;WLANProfile xmlns=&quot;http://www.microsoft.com/networking/WLAN/profile/v1&quot;&gt;
	&lt;name&gt;Fleet&lt;/name&gt;
	&lt;SSIDConfig&gt;
		&lt;SSID&gt;
			&lt;hex&gt;466C656574&lt;/hex&gt;
			&lt;name&gt;Fleet&lt;/name&gt;
		&lt;/SSID&gt;
		&lt;nonBroadcast&gt;false&lt;/nonBroadcast&gt;
	&lt;/SSIDConfig&gt;
	&lt;connectionType&gt;ESS&lt;/connectionType&gt;
	&lt;connectionMode&gt;auto&lt;/connectionMode&gt;
	&lt;MSM&gt;
		&lt;security&gt;
			&lt;authEncryption&gt;
				&lt;authentication&gt;WPA2PSK&lt;/authentication&gt;
				&lt;encryption&gt;AES&lt;/encryption&gt;
				&lt;useOneX&gt;false&lt;/useOneX&gt;
			&lt;/authEncryption&gt;
			&lt;sharedKey&gt;
				&lt;keyType&gt;passPhrase&lt;/keyType&gt;
				&lt;protected&gt;false&lt;/protected&gt;
				&lt;keyMaterial&gt;fleetwifi&lt;/keyMaterial&gt;
			&lt;/sharedKey&gt;
		&lt;/security&gt;
	&lt;/MSM&gt;
&lt;/WLANProfile&gt;</Data>
  </Item>
</Add>

PezHub avatar Apr 29 '25 15:04 PezHub

Wi-Fi profile fixed, Fleet's harmony restored. No errors exist.

fleet-release avatar May 23 '25 12:05 fleet-release