Helm uninstall ARC is leaving CRDs resources behind
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.8.1
Deployment Method
Helm
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. helm uninstall arc-scale-set-kubernetes --namespace arc-runners
2. wait
3. helm uninstall arc --namespace arc-systems
4. kubectl get crds -A
Describe the bug
After uninstalling runners and controller with the steps as described above, the CustomResourceDefinitions still remain on the cluster.
autoscalinglisteners.actions.github.com 2024-01-31T12:36:12Z autoscalingrunnersets.actions.github.com 2024-01-31T12:36:13Z ephemeralrunners.actions.github.com 2024-01-31T12:36:17Z ephemeralrunnersets.actions.github.com 2024-01-31T12:36:18Z
Describe the expected behavior
I would expect, that everything is deleted what has been created with the two helm install commands. That the secret or container-hook-role is remaining is to be expected and fine.
helm install arc --namespace arc-systems --set image.tag=0.8.1 -f controller/values.yaml oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller --version "0.8.1"
helm install arc-scale-set-kubernetes --namespace arc-runners -f runner-set/values.yaml oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set --version 0.8.1
kubectl apply -f runner-set/container-hook-role.yaml
Additional Context
controller values.yaml:
labels: {}
replicaCount: 1
image:
repository: "ghcr.io/actions/gha-runner-scale-set-controller"
pullPolicy: IfNotPresent
tag: ""
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
env:
- name: "HTTP_PROXY"
value: ""
- name: "HTTPS_PROXY"
value: ""
- name: "NO_PROXY"
value: ""
serviceAccount:
create: true
annotations: {}
name: ""
podAnnotations: {}
podLabels: {}
podSecurityContext: {}
securityContext:
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
resources:
limits:
cpu: 100m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
nodeSelector: {}
tolerations: []
affinity: {}
priorityClassName: ""
flags:
logLevel: "debug"
logFormat: "text"
updateStrategy: "immediate"
runner-set values.yaml:
## githubConfigUrl is the GitHub url for where you want to configure runners
## ex: https://github.com/myorg/myrepo or https://github.com/myorg
githubConfigUrl: "https://GH_ENTERPRISE"
## githubConfigSecret is the k8s secrets to use when auth with GitHub API.
## You can choose to use GitHub App or a PAT token
#githubConfigSecret:
### GitHub Apps Configuration
## NOTE: IDs MUST be strings, use quotes
#github_app_id: ""
#github_app_installation_id: ""
#github_app_private_key: |
### GitHub PAT Configuration
# github_token: ""
## If you have a pre-define Kubernetes secret in the same namespace the gha-runner-scale-set is going to deploy,
## you can also reference it via `githubConfigSecret: pre-defined-secret`.
## You need to make sure your predefined secret has all the required secret data set properly.
## For a pre-defined secret using GitHub PAT, the secret needs to be created like this:
## > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_token='ghp_your_pat'
## For a pre-defined secret using GitHub App, the secret needs to be created like this:
## > kubectl create secret generic pre-defined-secret --namespace=my_namespace --from-literal=github_app_id=123456 --from-literal=github_app_installation_id=654321 --from-literal=github_app_private_key='-----BEGIN CERTIFICATE-----*******'
githubConfigSecret: pat-eks-arc-runners
## proxy can be used to define proxy settings that will be used by the
## controller, the listener and the runner of this scale set.
#
proxy:
http:
url: **
https:
url: **
noProxy:
- *
maxRunners: 10
minRunners: 1
runnerGroup: "arc"
runnerScaleSetName: "arc"
containerMode:
type: "kubernetes"
kubernetesModeWorkVolumeClaim:
accessModes: ["ReadWriteOnce"]
# For local testing, use https://github.com/openebs/dynamic-localpv-provisioner/blob/develop/docs/quickstart.md to provide dynamic provision volume with storageClassName: openebs-hostpath
storageClassName: "encrypted-standard"
resources:
requests:
storage: 1Gi
listenerTemplate:
spec:
containers:
- name: listener
securityContext:
runAsUser: 1000
resources:
requests:
memory: "200Mi"
cpu: "250m"
limits:
memory: "400Mi"
cpu: "500m"
template:
spec:
containers:
- name: runner
image: ghcr.io/actions/actions-runner:2.312.0
imagePullPolicy: Always
command: ["/home/runner/run.sh"]
env:
# https://github.com/actions/runner-container-hooks/blob/main/packages/k8s/README.md SET TO TRUE
- name: ACTIONS_RUNNER_REQUIRE_JOB_CONTAINER
value: "false"
- name: ACTIONS_RUNNER_CONTAINER_HOOK_TEMPLATE
value: "/home/runner/pod-template.yaml"
securityContext:
runAsUser: 1001
runAsGroup: 123
fsGroup: 123
resources:
# requests:
# memory: "1Gi"
# cpu: "900m"
# limits:
# memory: "3Gi"
# cpu: "900m"
requests:
memory: "200Mi"
cpu: "250m"
limits:
memory: "400Mi"
cpu: "500m"
imagePullSecrets:
- name: artifactory
#https://github.com/actions/actions-runner-controller/issues/3043
# controllerServiceAccount:
# namespace: <namespace of controller>
# name: <release name of controller>-gha-rs-controller
controllerServiceAccount:
namespace: arc-systems
name: arc-gha-rs-controller
Controller Logs
Logs are not accessible after uninstalling the controller.
Runner Pod Logs
Logs are not accessible after uninstalling the runner-set.
Hey @David9902,
Unfortunately, we can't do anything about it since helm does not allow upgrading or deleting CRDs.
Hi @nikola-jokic, thank you for your fast response! How would then the recommended steps of uninstalling ARC look like?
I'm asking since it is requiered to uninstall before "upgrading" (reinstalling) to a newer version as also described in the docs https://docs.github.com/en/[email protected]/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#upgrading-arc
Right, we should better document this process :relaxed: Thanks for raising this!
Thank you! I think this is an important one since it is the official way of doing the upgrade. Hope to read it soon in the docs.
+1 to improve documentation of how to uninstall.
Today I had to manually remove the finalizers from a few resources (autoscalingrunnersets.actions.github.com, rolebindings.rbac.authorization.k8s.io and roles.rbac.authorization.k8s.io) that were causing the arc-runners namespace to get stuck in Terminating when deleting (after having uninstalled both charts with helm).
Not sure what I did wrong when trying to clean up to end up in this situation.
@hsuabina yes it seems to be the way to go after helm uninstall arc -n arc-systems
But not 100% sure if it is really the correct way to do it
@nikola-jokic I just came across this and I'm wondering why GitHub can't just provide the CRDs outside of the helm chart so that users can 1) update CRDs manually using kubectl apply and then 2)update the helm chart. This seems simpler than asking all users to completely uninstall ARC before each upgrade.
Are the CRDS for gha-scale-set-runner available in the ARC repo?
EDIT: Looks like they are available here. Any reason I can't just update these manually and then update the Helm chart? Could also just pull the CRDs directly out of the release in actions-runner-controller.yaml.
Docs are updated: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/deploying-runner-scale-sets-with-actions-runner-controller#upgrading-arc. Thank you again for raising this!
@joshuabaird we can't maintain them separately. Since we don't have a webhook, we cannot maintain multiple versions. So the controller must understand it's CRD version. That is why during non-breaking changes, upgrade of CRDs is not necessary. But if anything changes, the controller should operate only on the CRD published for its version.
Closing this one since the docs are updated :relaxed:
@joshuabaird we can't maintain them separately. Since we don't have a webhook, we cannot maintain multiple versions. So the controller must understand it's CRD version. That is why during non-breaking changes, upgrade of CRDs is not necessary. But if anything changes, the controller should operate only on the CRD published for its version.
How does a user know if there are breaking changes? Just based on semvar? I still don't quite understand why this process wouldn't work -- realizing that there may be brief downtime in between CRD updates and controller updates to due to the incompatibility that you mentioned:
- Install new CRDs
- Install new controllers
Basically -- take the actions-runner-controller.yaml and just apply it over an existing version.
Unfortunately, having to "uninstall" ARC completely doesn't really lend itself to modern deployment patterns like Gitops, etc.
The problem is that old scale sets are based on old CRDs. So we would have to transform them based on the version they are in, and maintain multiple versions. With that in mind, mutating webhook is introduced, but to eliminate security concerns around webhooks, we decided not to have them. Whenever we introduce a breaking change, we will increment the minor version. But we don't always introduce breaking changes on minor versions, so release notes are probably the best place where you can see if we introduced a breaking change.
I agree with you that upgrade process is not so easy... But at least for now, we have to keep this limitation.