Listener pod failing after scale-set upgrade
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
2.318.0
Deployment Method
Helm
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Upgrade `gha-runner-scale-set` from any version to another, example: 2.317.0 -> 2.318.0
2. Check logs of the listener pod, example:
kubectl logs -f self-hosted-hide-7ff847bf-listener
Logs:
2024-08-28T09:43:33Z INFO listener-app.listener Current runner scale set statistics. {"statistics": "{\"totalAvailableJobs\":0,\"totalAcquiredJobs\":1,\"totalAssignedJobs\":1,\"totalRunningJobs\":0,\"totalRegisteredRunners\":0,\"totalBusyRunners\":0,\"totalIdleRunners\":0}"}
2024-08-28T09:43:33Z INFO listener-app.worker.kubernetesworker Calculated target runner count {"assigned job": 1, "decision": 1, "min": 0, "max": 5, "currentRunnerCount": 1, "jobsCompleted": 0}
2024-08-28T09:43:33Z INFO listener-app.worker.kubernetesworker Compare {"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-08-28T09:43:33Z INFO listener-app.worker.kubernetesworker Preparing EphemeralRunnerSet update {"json": "{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-08-28T09:43:33Z INFO listener-app.listener Deleting message session
2024/08/28 09:43:34 Application returned an error: handling initial message failed: could not patch ephemeral runner set , patch JSON: {"spec":{"patchID":0,"replicas":1}}, error: ephemeralrunnersets.actions.github.com "self-hosted-hide-rhtjx" not found
### Describe the bug
It looks like that the listener is looking for a `ephemeralrunnersets` that does not exist. Checking the properties of CRD `autoscalinglisteners` I could confirm that this resource is tied to the `ephemeralrunnersets.actions.github.com "self-hosted-hide-rhtjx"`
kubectl describe autoscalinglisteners self-hosted-hide-7ff847bf-listener -n github-self-hosted-runners
Name: self-hosted-hide-7ff847bf-listener Namespace: github-self-hosted-runners Labels: actions.github.com/organization=hidehide actions.github.com/scale-set-name=self-hosted-hide actions.github.com/scale-set-namespace=github-self-hosted-scale-set app.kubernetes.io/component=runner-scale-set-listener app.kubernetes.io/instance=self-hosted-hide app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=self-hosted-hide app.kubernetes.io/part-of=gha-runner-scale-set app.kubernetes.io/version=0.9.3 helm.sh/chart=gha-rs-0.9.3 ...
Ephemeral Runner Set Name: self-hosted-hide-rhtjx
Currently to fix the issue I have to delete the `autoscalinglisteners` every time I upgrade a version.
kubectl delete autoscalinglisteners self-hosted-appsupport-7ff847bf-listener
### Describe the expected behavior
The listener does not fail after a version upgrade of the scale-set
### Additional Context
```yaml
n/a
Controller Logs
2024-08-28T09:14:16Z INFO AutoscalingListener Listener pod is terminated {"version": "0.9.3", "autoscalinglistener": {"name":"self-hosted-hide-7ff847bf-listener","namespace":"github-self-hosted-runners"}, "namespace": "github-self-hosted-runners", "name": "self-hosted-hide-7ff847bf-listener", "reason": "Error", "message": ""}
2024-08-28T09:14:17Z INFO AutoscalingListener Listener pod is terminated {"version": "0.9.3", "autoscalinglistener": {"name":"self-hosted-hide-7ff847bf-listener","namespace":"github-self-hosted-runners"}, "namespace": "github-self-hosted-runners", "name": "self-hosted-hide-7ff847bf-listener", "reason": "Error", "message": ""}
2024-08-28T09:14:18Z INFO AutoscalingListener Listener pod is terminated {"version": "0.9.3", "autoscalinglistener": {"name":"self-hosted-hide-7ff847bf-listener","namespace":"github-self-hosted-runners"}, "namespace": "github-self-hosted-runners", "name": "self-hosted-hide-7ff847bf-listener", "reason": "Error", "message": ""}
Runner Pod Logs
it actually does not start any runner due the listener crashing
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.
Some additional context for my case:
- We are using Kustomize and the helm chart inflator to install the helm charts. This effectively is applied as raw yaml manifests via argocd
- We update both the helm charts for the controller and the runner-scale-set at the same time in the same PR
Hey everyone,
Please follow the upgrade procedure we documented here. The upgrades right now are basically uninstall the current version and install the new version. Therefore, you cannot upgrade both resources simultaneously.
I will close this one since it is working as expected, but we are planning to fix the upgrade process in the future.