actions-runner-controller Helm uninstall ARC is leaving CRDs resources behind

Checks

[X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
[X] I am using charts that are officially provided

Controller Version

0.8.1

Deployment Method

Helm

Checks

[X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
[X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. helm uninstall arc-scale-set-kubernetes --namespace arc-runners
2. wait
3. helm uninstall arc --namespace arc-systems
4. kubectl get crds -A

Describe the bug

After uninstalling runners and controller with the steps as described above, the CustomResourceDefinitions still remain on the cluster.

autoscalinglisteners.actions.github.com 2024-01-31T12:36:12Z autoscalingrunnersets.actions.github.com 2024-01-31T12:36:13Z ephemeralrunners.actions.github.com 2024-01-31T12:36:17Z ephemeralrunnersets.actions.github.com 2024-01-31T12:36:18Z

Describe the expected behavior

I would expect, that everything is deleted what has been created with the two helm install commands. That the secret or container-hook-role is remaining is to be expected and fine.

helm install arc --namespace arc-systems --set image.tag=0.8.1 -f controller/values.yaml oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller --version "0.8.1"

helm install arc-scale-set-kubernetes --namespace arc-runners -f runner-set/values.yaml oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set --version 0.8.1

kubectl apply -f runner-set/container-hook-role.yaml

Additional Context

controller values.yaml:
labels: {}

replicaCount: 1

image:
  repository: "ghcr.io/actions/gha-runner-scale-set-controller"
  pullPolicy: IfNotPresent
  tag: ""

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

env:
  - name: "HTTP_PROXY"
    value: ""
  - name: "HTTPS_PROXY"
    value: ""
  - name: "NO_PROXY"
    value: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations: {}

podLabels: {}

podSecurityContext: {}

securityContext:
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000

resources:
  limits:
    cpu: 100m
    memory: 512Mi
  requests:
    cpu: 100m
    memory: 128Mi

nodeSelector: {}
tolerations: []
affinity: {}
priorityClassName: ""

flags:
  logLevel: "debug"
  logFormat: "text"
  updateStrategy: "immediate"



runner-set values.yaml:
## githubConfigUrl is the GitHub url for where you want to configure runners
## ex: https://github.com/myorg/myrepo or https://github.com/myorg
githubConfigUrl: "https://GH_ENTERPRISE"

## githubConfigSecret is the k8s secrets to use when auth with GitHub API.
## You can choose to use GitHub App or a PAT token
#githubConfigSecret:
  ### GitHub Apps Configuration
  ## NOTE: IDs MUST be strings, use quotes
  #github_app_id: ""
  #github_app_installation_id: ""
  #github_app_private_key: |

  ### GitHub PAT Configuration
#  github_token: ""
## If you have a pre-define Kubernetes secret in the same namespace the gha-runner-scale-set is going to deploy,
## you can also reference it via `githubConfigSecret: pre-defined-secret`.
## You need to make sure your predefined secret has all the required secret data set properly.
##   For a pre-defined secret using GitHub PAT, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_token='ghp_your_pat'
##   For a pre-defined secret using GitHub App, the secret needs to be created like this:
##   > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_app_id=123456 --from-literal=github_app_installation_id=654321 --from-literal=github_app_private_key='-----BEGIN CERTIFICATE-----*******'
githubConfigSecret: pat-eks-arc-runners

## proxy can be used to define proxy settings that will be used by the
## controller, the listener and the runner of this scale set.
#
proxy:
  http:
    url: **
  https:
    url: **
  noProxy:
    - *

maxRunners: 10
minRunners: 1

runnerGroup: "arc"

runnerScaleSetName: "arc"
containerMode:
  type: "kubernetes"
  kubernetesModeWorkVolumeClaim:
    accessModes: ["ReadWriteOnce"]
    # For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpath
    storageClassName: "encrypted-standard"
    resources:
      requests:
        storage: 1Gi

listenerTemplate:
  spec:
    containers:
    - name: listener
      securityContext:
        runAsUser: 1000
      resources:
        requests:
          memory: "200Mi"
          cpu: "250m"
        limits:
          memory: "400Mi"
          cpu: "500m"

template:
  spec:
    containers:
    - name: runner
      image: ghcr.io/actions/actions-runner:2.312.0
      imagePullPolicy: Always
      command: ["/home/runner/run.sh"]
      env:
      # https://github.com/actions/runner-container-hooks/blob/main/packages/k8s/README.md  SET TO TRUE
        - name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
          value: "false"
        - name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
          value: "/home/runner/pod-template.yaml"
      securityContext:
        runAsUser: 1001
        runAsGroup: 123
        fsGroup: 123
      resources:
        # requests:
        #   memory: "1Gi"
        #   cpu: "900m"
        # limits:
        #   memory: "3Gi"
        #   cpu: "900m"
        requests:
          memory: "200Mi"
          cpu: "250m"
        limits:
          memory: "400Mi"
          cpu: "500m"
    imagePullSecrets:
      - name: artifactory

#https://github.com/actions/actions-runner-controller/issues/3043
# controllerServiceAccount:
#   namespace: <namespace of controller>
#   name: <release name of controller>-gha-rs-controller

controllerServiceAccount:
  namespace: arc-systems
  name: arc-gha-rs-controller

Controller Logs

Logs are not accessible after uninstalling the controller.

Runner Pod Logs

Logs are not accessible after uninstalling the runner-set.

Feb 12 '24 14:02 David9902

Hey @David9902,

Unfortunately, we can't do anything about it since helm does not allow upgrading or deleting CRDs.

Feb 13 '24 08:02 nikola-jokic

Hi @nikola-jokic, thank you for your fast response! How would then the recommended steps of uninstalling ARC look like?

I'm asking since it is requiered to uninstall before "upgrading" (reinstalling) to a newer version as also described in the docs https://docs.github.com/en/[email protected]/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#upgrading-arc

Feb 13 '24 09:02 David9902

Right, we should better document this process :relaxed: Thanks for raising this!

Feb 14 '24 15:02 nikola-jokic

Thank you! I think this is an important one since it is the official way of doing the upgrade. Hope to read it soon in the docs.

Feb 14 '24 16:02 David9902

+1 to improve documentation of how to uninstall.

Today I had to manually remove the finalizers from a few resources (autoscalingrunnersets.actions.github.com, rolebindings.rbac.authorization.k8s.io and roles.rbac.authorization.k8s.io) that were causing the arc-runners namespace to get stuck in Terminating when deleting (after having uninstalled both charts with helm).

Not sure what I did wrong when trying to clean up to end up in this situation.

Mar 08 '24 13:03 hsuabina

@hsuabina yes it seems to be the way to go after helm uninstall arc -n arc-systems

But not 100% sure if it is really the correct way to do it

Mar 08 '24 13:03 David9902

@nikola-jokic I just came across this and I'm wondering why GitHub can't just provide the CRDs outside of the helm chart so that users can 1) update CRDs manually using kubectl apply and then 2)update the helm chart. This seems simpler than asking all users to completely uninstall ARC before each upgrade.

Are the CRDS for gha-scale-set-runner available in the ARC repo?

EDIT: Looks like they are available here. Any reason I can't just update these manually and then update the Helm chart? Could also just pull the CRDs directly out of the release in actions-runner-controller.yaml.

Mar 18 '24 13:03 joshuabaird

Docs are updated: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#upgrading-arc. Thank you again for raising this!

@joshuabaird we can't maintain them separately. Since we don't have a webhook, we cannot maintain multiple versions. So the controller must understand it's CRD version. That is why during non-breaking changes, upgrade of CRDs is not necessary. But if anything changes, the controller should operate only on the CRD published for its version.

Closing this one since the docs are updated :relaxed:

Mar 22 '24 15:03 nikola-jokic

@joshuabaird we can't maintain them separately. Since we don't have a webhook, we cannot maintain multiple versions. So the controller must understand it's CRD version. That is why during non-breaking changes, upgrade of CRDs is not necessary. But if anything changes, the controller should operate only on the CRD published for its version.

How does a user know if there are breaking changes? Just based on semvar? I still don't quite understand why this process wouldn't work -- realizing that there may be brief downtime in between CRD updates and controller updates to due to the incompatibility that you mentioned:

Install new CRDs
Install new controllers

Basically -- take the actions-runner-controller.yaml and just apply it over an existing version.

Unfortunately, having to "uninstall" ARC completely doesn't really lend itself to modern deployment patterns like Gitops, etc.

Mar 22 '24 15:03 joshuabaird

The problem is that old scale sets are based on old CRDs. So we would have to transform them based on the version they are in, and maintain multiple versions. With that in mind, mutating webhook is introduced, but to eliminate security concerns around webhooks, we decided not to have them. Whenever we introduce a breaking change, we will increment the minor version. But we don't always introduce breaking changes on minor versions, so release notes are probably the best place where you can see if we introduced a breaking change.

I agree with you that upgrade process is not so easy... But at least for now, we have to keep this limitation.

Mar 22 '24 16:03 nikola-jokic