actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Reconsider Aligning `actions-runner` software with Github-hosted software

Open scruplelesswizard opened this issue 2 years ago • 11 comments

What would you like added?

I would like the existing images without Cloud-specific tooling to be appended with i.e. -slim or -cloudless , and for the default actions-runner images to have the same installed software as their related actions/runner-images images

Why is this needed?

Now that Actions Runner Controller has been officially adopted into the actions/ org I think it is worth reconsidering alignment of pre-installed software with actions/runner-images.

ARC's end users are often unaware that ARC is being used and expect consistent pre-installed software when developing their workflows. When ARC runners are provided to supplement or replace Github hosted runners errors are likely to occur due to the delta between Github-hosted runners and ARC-hosted runners.

While some end-user friction can be mitigated by an organizations ARC operators they may not have insight into all tooling used across the org, leading to increased friction for ARC adoption. Sourcing the same software as actions/runner-images reduces that friction significantly.

Additional context

Installed Software for actions/runner-images

ubuntu-latest or ubuntu-22.04 ubuntu-20.04

I'm happy to help with the lift on this, but I wanted to float it in the community first for feedback

scruplelesswizard avatar Mar 10 '23 01:03 scruplelesswizard

While I agree that the parity would be nice, just adding the az CLI for our runners increased the image size to over 2.1GB. While that is probably an outlier, I think that installing all of that tooling would lead to the image growing to an unsustainable and unusable size.

cwoodcox avatar Mar 15 '23 19:03 cwoodcox

While I agree that the parity would be nice, just adding the az CLI for our runners increased the image size to over 2.1GB. While that is probably an outlier, I think that installing all of that tooling would lead to the image growing to an unsustainable and unusable size.

I absolutely agree that offering a small, trimmed-down image that can be used or customized as needed is very important, as is keeping container images as small as possible.

At the same time parity with the Github-hosted runners is a common expectation by many teams adopting ARC, and is a high-friction point for adoption. Providing Github-hosted parity images as a default option, while calling out the disadvantages in our documentation would be a great way for teams to easily get started with ARC, then optimize for their use-cases.

As an alternative we could prioritize keeping images small while offering some compatibility. For example, we could offer a matrixed set of images, based on Cloud and Language. This strategy would likely require a fairly significant CI lift to implement, and mean a much wider set of images to maintain. It would require ARC users to create many different runner sets if they are using multiple languages, which doesn't offer the low-friction adoption users expect.

It's a matter of enabling ARC adopters to "make it work, then make it good"

scruplelesswizard avatar Mar 16 '23 19:03 scruplelesswizard

Hey! Thanks for the detailed feedback. While I generally agree that the "make it work, then make it good" way of getting started with self-hosted runners is great for usability, I'm unsure if that's really what everyone wants if we implement it naively. I'd love some design discussion first.

My biggest concern is that when I last checked, the full runner image could be larger than 10GB. Defaulting to an image of this size wouldn't be an option until Kubernetes and its cloud offerings have sane support for somehow distributing/prepopulating/warming up the container image so that runner pods can still come up in several seconds, not a minute or two or more(depending on where your runner pod is going to be hosted... A raspi in a home network?)

mumoshu avatar Mar 30 '23 22:03 mumoshu

At the same time parity with the Github-hosted runners is a common expectation by many teams adopting ARC, and is a high-friction point for adoption.

I've always assumed the full docker images provided by the https://github.com/nektos/act project were an accurate representation of the size to expect with a runner container with software parity with GitHub's runners. https://hub.docker.com/layers/catthehacker/ubuntu/full-20.04/images/sha256-598b616a8c7ce86d98ee63871cec532f4ff645125b563a8798f2ae1c98928ec7?context=explore. ~14GB images are far far far too big to be a default image imo, you'll just be trading 1 friction point for another.

As an alternative we could prioritize keeping images small while offering some compatibility. For example, we could offer a matrixed set of images, based on Cloud and Language. This strategy would likely require a fairly significant CI lift to implement, and mean a much wider set of images to maintain.

It's really a question for GitHub to ask and answer internally. There's a middle ground between as slim as possible and parity with their virtual environments but with it comes increased overheads maintaining the runner images. Only GitHub really can say whether that is something they are willing to take on or not.

toast-gear avatar Mar 31 '23 11:03 toast-gear

tl;dr; It seems more probable that an ARC operator will be aware of our image sizes than the end-user will be of the runner implementation.

~14GB images are far far far too big to be a default image imo, you'll just be trading 1 friction point for another.

Image size is a valid concern. We offer a slim base runner image that can be filled with whatever tools are need. That is a great option for many of our users.

Having a default runner image of ~15GiB would require significantly more network and storage resources, and would also require maintaining the pre-installed tools. The networking and storage for ARC distribution are already offered and managed by Github, with the exception of a few legacy sources. Much of the work for maintenance already exists within the actions ecosystem, but it would require some coordinated effort to incorporate into ARC.

The most impact would be on ARCs users due to the size of the container image. However, this would only impact new users adopting ARC. Existing ARC users already have adopted the slim runner image directly, or use it as a base for their custom runner images.

It's worth considering that in some organzations ARC's operators may be unaware of the tools needed by the end-users. What could happen if an operator deploys ARC today with an expectation of parity? What options does an operator have if they are unable build and maintain container images and end-user packages?

My biggest concern is ... that runner pods can still come up in several seconds

Might an end-user perfer to have a runner come up slowly and behave as expected instead?

A workflow executed on an ARC runner is relatively indestinguishable workflow executed on a Github-hosted runner, other than the pre-installed tools. If a workflow's user is unaware of ARCs implementation and the preinstalled tools were missing, what might their experience be like? What if their organization allows use of both Github-hosted and ARC-hosted runners?

A few things I can think of that might improve the day-0 experience:

  • Ensure adopters are aware of the runner tooling differences.
  • Offer parity runner images and document them as an option.
  • Default to parity images and document the large image sizes.

scruplelesswizard avatar Apr 03 '23 00:04 scruplelesswizard

Thanks, @toast-gear and @chaosaffe! As of today, I'd agree with the first option @chaosaffe mentioned.

So I'd agree if we do this:

  • Ensure adopters are aware of the runner tooling differences.

and we don't do the following:

  • Offer parity runner images and document them as an option.
  • Default to parity images and document the large image sizes.

Providing and recommending the use of 14GB container images would end up giving the wrong expectation to most users. They'll start and keep complaining about why runners won't come up fast, and our answers might be just "make your network/storage fast or introduce your own P2P image distribution mechanism or a node warm-up solution" which isn't practically easy.

mumoshu avatar Apr 03 '23 00:04 mumoshu

The goal of this project is to provide a functional model for our customers to scale their own self-hosted infrastructure that fits their needs, not to provide a self-hosted model that is at parity with what is offered by the GitHub Cloud.

With that in mind we will only build and support a minimal runner image that customers can use as a base for their own needs.

chrispat avatar May 02 '23 15:05 chrispat

@chrispat That's fair, but should be much more clearly communicated to the users. It is not directly obvious that the ubuntu-22.04 self-hosted runner images are not at parity with GitHub's ubuntu-22:04 runners. Our developers expect a smooth transition from managed to self-hosted runners. The current state of things does not highlight that there is an implicit expectation on the self-hosted runner providers to ensure tooling parity with managed runners.

vyrwu avatar Aug 28 '24 09:08 vyrwu

@chrispat

We were hit by this during migration from myoung34/github-runner:ubuntu-noble to ARC. It would have helped us if it was clearly defined that ARC is more minimalistic than the traditional Github hosted runners.

As @vyrwu mentions there is an expection from developers that currently are missaligned.

In our case the rootcause, was a missing package (XZ Utils) that was being used from a github action (https://github.com/mlugg/setup-zig).

In theory I guess it should have been up to the github action to verify that the package were installed?

nmwael avatar Jan 07 '25 09:01 nmwael

https://github.com/Azure/login action fails with error: Error: Login failed with Error: Unable to locate executable file: az. Please verify either the file path exists or the file can be found within a directory specified by the PATH environment variable. Also check the file mode to verify the file is executable.. Double check if the 'auth-type' is correct. Refer to https://github.com/Azure/login#readme for more information.

Again, this works on the GH hosted runners.

So it does seem that some basic packages are not in place, which are expected from actions?

nmwael avatar Jan 08 '25 11:01 nmwael

I don't know how exactly, but my jobs suddenly broke tonight because they didn't have make installed (I would guess because the latest image finally did a cutover, and/or my image cache finally updated, but unsure)

Regardless, it was a surprise to me to find out that make of all things was not considered essential enough (given its literally in the package called build-essentials) to include in the default image.

While I agree that 14Gb would be too large to have as a default image, it would be nice to have that option if its desired (and the user had reasonable image caching, it could work for them). I also think a middle-ground between "this image has almost nothing, build it yourself" and "this image has 7 kitchen sinks" should be reasonably achievable.

deefdragon avatar Oct 31 '25 04:10 deefdragon