fleet icon indicating copy to clipboard operation
fleet copied to clipboard

Host transferred to a team via the POST /api/v1/fleet/hosts/transfer endpoint remained in No Team

Open ddribeiro opened this issue 11 months ago • 2 comments

Fleet version: 4.63.1 Slack thread where this was reported: https://fleetdm.slack.com/archives/C075TURNLB0/p1741027763838929 Engineering Slack thread: https://fleetdm.slack.com/archives/C019WG4GH0A/p1741099768784259


💥  Actual behavior

customer-deebradel has an automation that detects when a macOS host becomes MDM enrolled and transfers it to a desired team in Fleet.

They are reporting an issue with this flow, where they used the API to transfer a host to a team, but the host did not transfer to the team. The API responded with a 200, the Flet server logs show the API call with seemingly no associated errors, and the global actives feed shows the host was transferred to the team. Yet, the host remained in No Team.

I have no reproduced this on my own yet and deebradel has seen it happen once so far.

🧑‍💻  Steps to reproduce

  1. Ensure your test host does not currently have a record created for it in your Fleet server.
  2. Enroll a macOS host into Fleet by installing the manual enrollment profile. This should send a mdm_enrolled event to the global activity webhook and enroll the host into your Fleet server.
  3. The mdm_enrolled webhook event should kick off your automation to transfer the host to another team.
  4. Check the Fleet API response, server logs, and activity feed to see that the transfer team action has taken place and was successful.
  5. Check whether the host still exists in No Teams or was actually transferred to the desired team.

🕯️ More info (optional)

  • I suspect whatever is causing this issue might be related to the timing of immediately transferring to a new team upon mdm_enrollment. Therefore it is important to build an automation that handles the transfer action via the API rather than doing it manually. Here's what deebradel is doing:
    • Build an automation that watches the Fleet global activity webhook.
    • Upon receipt of an mdm_enrolled audit event, capture the host_serial from the webhook payload.
    • Use the host_serial to make a call to the Get host by identifier endpoint (GET /api/v1/fleet/hosts/identifier/:identifier), using the serial number as the identifier. Capture the id from that response.
    • Use the id from the previous step to transfer the host to a new team by making a call to the Transfer hosts to a team endpoint (POST /api/v1/fleet/hosts/transfer).
    • Check if the host transferred to the correct team successfully.

Some other thoughts: I am wondering if there is a potential race condition happening related to the timing of delivering MDM profiles, installing the fleet-base.pkg via InstallEnterpriseApplication command, and transferring the host via the API.

  1. The computer enrolled to Fleet in No Team via the end user installing the MDM profile. Fleet sends a Fleetd configuration profile to the host that contains the enrollment secret for No Team.
  2. The host is transferred to the desired team via the API after receiving the mdm_enrolled webhook.
  3. The Fleet agent gets delivered as a bootstrap package before the next MDM cron runs to update profiles. The old Fleetd configuration profile is still on the host.
  4. The host becomes fleet_enrolled when the agent installs and goes back to No Teams based on the information in the Fleetd configuration profile (which had not yet been updated on the host with the new team's information when the agent installed).

ddribeiro avatar Mar 05 '25 19:03 ddribeiro

@noahtalerman @jmwatts Hey, can this go straight to the product or engineering boards since there are steps to reproduce? The customer is asking for an update.

ddribeiro avatar Apr 10 '25 16:04 ddribeiro

@sharon-fdm or @xpkoala Please see @ddribeiro 's question above, thanks!

jmwatts avatar Apr 10 '25 16:04 jmwatts

@ddribeiro all bugs we (fleeties) file go straight to drafting (:product): https://fleetdm.com/handbook/company/product-groups#release-testing:~:text=Fast%20track%20for%20Fleeties%3A%20Fleeties%20do%20not%20have%20to%20wait%20for%20QA%20to%20reproduce%20the%20bug.%20If%20you%27re%20confident%20it%27s%20reproducible%2C%20it%27s%20a%20bug%2C%20and%20the%20reproduction%20steps%20are%20well%2Ddocumented%2C%20it%20can%20be%20moved%20directly%20to%20the%20reproduced%20state.

noahtalerman avatar Apr 11 '25 18:04 noahtalerman

Hey team! Please add your planning poker estimate with Zenhub @dantecatalfamo @jacobshandling @lucasmrod @sgress454

sharon-fdm avatar Apr 14 '25 18:04 sharon-fdm

Bold estimation assuming easy reproduction.

sharon-fdm avatar Apr 14 '25 18:04 sharon-fdm

Hey @ddribeiro I'm unable to replicate based on the provided steps.

Do you know if they are also installing Fleetd via a Bootstrap package or similar as part of their "flow"?

Thanks

juan-fdz-hawa avatar May 05 '25 17:05 juan-fdz-hawa

Pushing back to the backlog due to taking MDM stuff instead.

sharon-fdm avatar May 28 '25 14:05 sharon-fdm

Just wanted to update this issue to say the customer mentioned they are still seeing this issue (happened 2 times last week).

@juan-fdz-hawa My understanding is they are installing the MDM profile first, then the Fleet agent is being delivered automatically as a bootstrap package (InstallEnterpriseApplication MDM command).

ddribeiro avatar May 28 '25 17:05 ddribeiro

The customer is asking if we could return focus to this issue, as they are still seeing it occur in their environment and breaks a crucial part of their self-service enrollment flow when it happens.

@lukeheath Can I request a P2 for this issue, since a supported workflow is not working as intended 100% of the time?

ddribeiro avatar Jun 04 '25 15:06 ddribeiro

@ddribeiro You got it - upgraded to P2.

@sharon-fdm Please prioritize accordingly. Thanks!

lukeheath avatar Jun 04 '25 15:06 lukeheath

@lukeheath, will take it for next sprint.

sharon-fdm avatar Jun 13 '25 14:06 sharon-fdm

@ddribeiro This might be a silly question but why are they not using the fleet_enrolled event for their automation?

juan-fdz-hawa avatar Jun 25 '25 19:06 juan-fdz-hawa

@ddribeiro Please refer to this when you get a chance - I'm not sure what the fix for this should be.

juan-fdz-hawa avatar Jun 26 '25 15:06 juan-fdz-hawa

@juan-fdz-hawa

This might be a silly question but why are they not using the fleet_enrolled event for their automation?

This is a good question, and I think it might be because the automation was built before fleet_enrolled existed. Do you think using fleet_enrolled would handle this workflow better?

ddribeiro avatar Jun 27 '25 23:06 ddribeiro

@juan-fdz-hawa

This might be a silly question but why are they not using the fleet_enrolled event for their automation?

This is a good question, and I think it might be because the automation was built before fleet_enrolled existed. Do you think using fleet_enrolled would handle this workflow better?

I believe so, as a bonus the fleet_enrolled event includes the HostID in the payload.

juan-fdz-hawa avatar Jun 30 '25 10:06 juan-fdz-hawa

Customer is open to using the fleet_enrolled event and will discuss internally - closing for now.

juan-fdz-hawa avatar Jul 02 '25 18:07 juan-fdz-hawa

API call made, yet host lingers, In the realm of No Teams, it fingers. Transfer complete, brings ease.

fleet-release avatar Jul 02 '25 18:07 fleet-release

Hey @noahtalerman and @ddribeiro - I'm reopening this, as customer-deebradel confirmed that the fleet_enrolled webhook does not work for them. They found that it fires off too late, and some commands need to be sent during the exact moment of enrollment. I'll add a new Gong snippet shortly

pintomi1989 avatar Aug 13 '25 19:08 pintomi1989

I'm reopening this, as customer-deebradel confirmed that the fleet_enrolled webhook does not work for them. They found that it fires off too late, and some commands need to be sent during the exact moment of enrollment. I'll add a new Gong snippet shortly

FYI @marko-lisica we re-opened this P2 bug. I moved it from #g-mdm to #g-orchestration` because the use case / workflow seems like it's closer to MDM.

noahtalerman avatar Aug 14 '25 21:08 noahtalerman

@marko-lisica @georgekarrv

We’ve confirmed with customer-deebradel that in some cases the host reverts back to No Team even after the second transfer API call (on fleet_enrolled).

Both transfer requests return 200 and a follow-up GET /hosts/:id shows the host in the correct team at that moment. However, shortly after the assignment is overwritten back to null.

This suggests it’s not only a race on mdm_enrolled but that something in the bootstrap / agent enrollment flow is actively resetting the team assignment.

AdamBaali avatar Aug 26 '25 15:08 AdamBaali

Hey @PezHub, we discussed this bug during standup. I'm assigning this one to you, so you can try to reproduce and understand it better before engineers start digging into this. I added reproduction steps from the Slack thread that is linked in the same section.

marko-lisica avatar Aug 26 '25 16:08 marko-lisica

Hi @ddribeiro do we know if the customer has updated from 4.63.1? Can you provide their current version before I try to repro please?

PezHub avatar Aug 26 '25 18:08 PezHub

@PezHub they are running 4.71.1 running one version behind latest

AdamBaali avatar Aug 26 '25 18:08 AdamBaali

QA Notes:

I was able to reproduce what the customer is seeing using either mdm_enrolled or fleet_enrolled as the trigger. I used a script with curl commands and API calls to try to simulate their automation and am happy to share that with the engineer that grabs this ticket.

Steps:

  1. Start script which polls the activities API for fleet_enrolled (or mdm_enrolled)
  2. I manually kick off an enrollment - I tried ADE (with and without a custom fleetd bootstrap installer) and Manual enrollment
  3. Once the activity is created it grabs the host's serial and host id
  4. Transfers it to Team B
  5. I see the 200 OK
  6. Host enrolls successfully but remains on No Team
Image

PezHub avatar Aug 27 '25 16:08 PezHub

Thanks @PezHub, I moved this to "ready to estimate".

cc @georgekarrv

marko-lisica avatar Aug 27 '25 17:08 marko-lisica

Putting a 2 for the timebox investigation to figure out how this is happening

georgekarrv avatar Sep 08 '25 17:09 georgekarrv

@PezHub please send that script my way.

I tried removing any relations to the device, and reset the device.

Setup a poller, that transfer team based on mdm_enrolled activity, which it does successfully, however the device never "jumps back" to the old team.

MagnusHJensen avatar Sep 16 '25 11:09 MagnusHJensen

Asked @PezHub to reproduce again, after finding a small issue in his script. I can't reproduce, with mac ADE or iPhone manual enrolment, it always stay in the team I transferred it to when seeing mdm_enrolled activity, even after fleet_enrolled appears.

However I can see in my fleetd configuration (only on mac) that the enroll secret is of the new team, so maybe transferring happens "too fast" to reproduce? Wil give it a try with delay.

Still unable if waiting 10 seconds before transferring team after receiving mdm_enrolled.

Adding a sleep of 20 seconds after receiving mdm_enrolled and then transferring team, allows the fleetd profile to be delivered with the enroll secret of default no team, overriding it when fleetd enrolls, verified by checking the fleetd configuration profile and it's enroll secret.

It gets overwritten here: https://github.com/fleetdm/fleet/blob/baedfd083a96d7ebfae91ac20ec4ed7b0abdff64/server/datastore/mysql/hosts.go#L2233 which is called when Orbit enrolls a host, could also happen with osquery only enrolling.

Could a desired solution be to check if the device is already MDM enrolled? If so then, avoid setting the team on osquery or orbit enrolls.

cc @georgekarrv

@ddribeiro Is deebradel still delaying by 10 seconds? Based on what I can find, delaying should only make this more frequent if the server is sending the profile before those 10 seconds.

MagnusHJensen avatar Sep 17 '25 09:09 MagnusHJensen

@marko-lisica FYI, I moved this back to drafting as @georgekarrv mentioned yesterday.

Should I remove it from the MDM board?

MagnusHJensen avatar Sep 18 '25 07:09 MagnusHJensen

Just thinking out loud here. Could we make a transfer 'sticky' for enrollment.

eg: Host is mdm enrolled in team A, api call is run to transfer host to team B.

For the next 30m enrollment based team changes (eg lookup based on enrollment key) will be ignored and the host remains in the current team.

New api calls to transfer the host to team C even during the 30m window would still work though that then resets the 30m timer and makes it stick to team C.

@MagnusHJensen @marko-lisica @noahtalerman @JordanMontgomery thoughts?

georgekarrv avatar Sep 19 '25 18:09 georgekarrv