Linux Install Script Queue
Gong snippet: none
Problem
customer-beethoven wants to queue multiple installations at once for their Linux hosts, but when they do so one will succeed and the rest fail because the package manager is busy. customer-beethoven can work around this by building a timeout into the install script. Is it possible for Fleet to queue the installs for the user?
What have you tried?
I have built a timeout into the script, but this results in an arbitrary wait time and an extra step in my scripts. This also requires me to edit the automatically generated install script for uploaded packages to include the timeout.
Potential solutions
What is the expected workflow as a result of your proposal?
1: I trigger multiple installation scripts for a Linux host. 2: Fleet queues the installations, triggering the next script after the previous either completes or otherwise exits.
Added requisite tags so this gets moved through triage; as mentioned in Slack, this is absolutely a bug. @mason-buettner if you've repro'd this you can drop that tag.
@mostlikelee, moved it to your team. TMWYT
Original issue description here:
Gong snippet: none
Problem
customer-beethoven wants to queue multiple installations at once for their Linux hosts, but when they do so one will succeed and the rest fail because the package manager is busy. customer-beethoven can work around this by building a timeout into the install script. Is it possible for Fleet to queue the installs for the user?
What have you tried?
I have built a timeout into the script, but this results in an arbitrary wait time and an extra step in my scripts. This also requires me to edit the automatically generated install script for uploaded packages to include the timeout.
Potential solutions
What is the expected workflow as a result of your proposal?
1: I trigger multiple installation scripts for a Linux host. 2: Fleet queues the installations, triggering the next script after the previous either completes or otherwise exits.
Fleet version: TODO
@mason-buettner thanks for tracking this! What version of Fleet is customer-beethoven on? That will help us with debugging and get to a fix sooner.
We think this means that Fleet is trying to install the packages simultaneously.
This is definitely a bug. @lukeheath I'd argue this is a P2 that's worthy of disrupting the current sprint. We could be building on top of a shaky foundation.
@mostlikelee have we confirmed that this is only an issue on Linux hosts? Not macOS nor Windows?
cc @ksatter
FYI @georgekarrv @mna this looks like a unified queue bug.
This is reproduced with the default .deb install script? I have a hard time thinking this is a unified queue bug because even if UQ sends all to the script table to execute the logic in orbit has always been to run one at a time unless that has changed.
@mason-buettner Just want to make sure you've been able to reproduce this.
@noahtalerman I'll defer to @georgekarrv on the P2 designation for this one based on what we think the cause of the bug is.
@lukeheath Yes, we've been able to reproduce. Including a screenshot of the error from the Fleet dashboard we observed for an install that failed:
And the associated logs:
Just to note, we've seen inconsistent failure rates - sometimes 2/3 installs will complete, sometimes 1/3.
Hey team! Please add your planning poker estimate with Zenhub @iansltx @jahzielv @ksykulev
@noahtalerman @lukeheath just to be explicit, right now this is top priority for next sprint unless you all think different. Bringing in a P2 bug into the current sprint will be disruptive.
@mostlikelee Thanks for the clarity. Since the customer has a workaround for now, next sprint should be fine.
I haven't been able to reproduce this on latest main, but Kathy and Mason are working on getting data from when they reproduced + asking the customer to try again to reproduce.
@zayhanlon heads up we're having issues replicating this issue. Waiting to hear back from the customer.
@mostlikelee the customer said he'll get back to us at the end of the week
@zayhanlon any updates here?
@mostlikelee i think we need to update the issue. @mason-buettner will address when he's online today. heres the customer feedback - https://fleetdm.slack.com/archives/C0867SDM4F8/p1747624232615499?thread_ts=1747158858.308319&cid=C0867SDM4F8
Hi folks! After some more research/discussion, we found that this issue was actually two separate bugs. Because of that, we're going to close here.
Once the new bugs are filed, we'll link them in the comments here.
cc @zayhanlon @mostlikelee @mason-buettner
Linux hosts in sync, One by one, installs take flight. Fleet brings calm from chaos.