actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Listener pod failing after scale-set upgrade

Open x-504 opened this issue 1 year ago • 2 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

2.318.0

Deployment Method

Helm

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. Upgrade `gha-runner-scale-set` from any version to another, example: 2.317.0 -> 2.318.0
2. Check logs of the listener pod, example:

kubectl logs -f self-hosted-hide-7ff847bf-listener

Logs:

2024-08-28T09:43:33Z	INFO	listener-app.listener	Current runner scale set statistics.	{"statistics": "{\"totalAvailableJobs\":0,\"totalAcquiredJobs\":1,\"totalAssignedJobs\":1,\"totalRunningJobs\":0,\"totalRegisteredRunners\":0,\"totalBusyRunners\":0,\"totalIdleRunners\":0}"}
2024-08-28T09:43:33Z	INFO	listener-app.worker.kubernetesworker	Calculated target runner count	{"assigned job": 1, "decision": 1, "min": 0, "max": 5, "currentRunnerCount": 1, "jobsCompleted": 0}
2024-08-28T09:43:33Z	INFO	listener-app.worker.kubernetesworker	Compare	{"original": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":-1,\"patchID\":-1,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}", "patch": "{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"replicas\":1,\"patchID\":0,\"ephemeralRunnerSpec\":{\"metadata\":{\"creationTimestamp\":null},\"spec\":{\"containers\":null}}},\"status\":{\"currentReplicas\":0,\"pendingEphemeralRunners\":0,\"runningEphemeralRunners\":0,\"failedEphemeralRunners\":0}}"}
2024-08-28T09:43:33Z	INFO	listener-app.worker.kubernetesworker	Preparing EphemeralRunnerSet update	{"json": "{\"spec\":{\"patchID\":0,\"replicas\":1}}"}
2024-08-28T09:43:33Z	INFO	listener-app.listener	Deleting message session
2024/08/28 09:43:34 Application returned an error: handling initial message failed: could not patch ephemeral runner set , patch JSON: {"spec":{"patchID":0,"replicas":1}}, error: ephemeralrunnersets.actions.github.com "self-hosted-hide-rhtjx" not found


### Describe the bug

It looks like that the listener is looking for a `ephemeralrunnersets` that does not exist. Checking the properties of CRD `autoscalinglisteners` I could confirm that this resource is tied to the `ephemeralrunnersets.actions.github.com "self-hosted-hide-rhtjx"`

kubectl describe autoscalinglisteners self-hosted-hide-7ff847bf-listener -n github-self-hosted-runners

Name: self-hosted-hide-7ff847bf-listener Namespace: github-self-hosted-runners Labels: actions.github.com/organization=hidehide actions.github.com/scale-set-name=self-hosted-hide actions.github.com/scale-set-namespace=github-self-hosted-scale-set app.kubernetes.io/component=runner-scale-set-listener app.kubernetes.io/instance=self-hosted-hide app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=self-hosted-hide app.kubernetes.io/part-of=gha-runner-scale-set app.kubernetes.io/version=0.9.3 helm.sh/chart=gha-rs-0.9.3 ...

Ephemeral Runner Set Name: self-hosted-hide-rhtjx


Currently to fix the issue I have to delete the `autoscalinglisteners` every time I upgrade a version.

kubectl delete autoscalinglisteners self-hosted-appsupport-7ff847bf-listener




### Describe the expected behavior

The listener does not fail after a version upgrade of the scale-set

### Additional Context

```yaml
n/a

Controller Logs

2024-08-28T09:14:16Z	INFO	AutoscalingListener	Listener pod is terminated	{"version": "0.9.3", "autoscalinglistener": {"name":"self-hosted-hide-7ff847bf-listener","namespace":"github-self-hosted-runners"}, "namespace": "github-self-hosted-runners", "name": "self-hosted-hide-7ff847bf-listener", "reason": "Error", "message": ""}
2024-08-28T09:14:17Z	INFO	AutoscalingListener	Listener pod is terminated	{"version": "0.9.3", "autoscalinglistener": {"name":"self-hosted-hide-7ff847bf-listener","namespace":"github-self-hosted-runners"}, "namespace": "github-self-hosted-runners", "name": "self-hosted-hide-7ff847bf-listener", "reason": "Error", "message": ""}
2024-08-28T09:14:18Z	INFO	AutoscalingListener	Listener pod is terminated	{"version": "0.9.3", "autoscalinglistener": {"name":"self-hosted-hide-7ff847bf-listener","namespace":"github-self-hosted-runners"}, "namespace": "github-self-hosted-runners", "name": "self-hosted-hide-7ff847bf-listener", "reason": "Error", "message": ""}

Runner Pod Logs

it actually does not start any runner due the listener crashing

x-504 avatar Aug 28 '24 11:08 x-504

Hello! Thank you for filing an issue.

The maintainers will triage your issue shortly.

In the meantime, please take a look at the troubleshooting guide for bug reports.

If this is a feature request, please review our contribution guidelines.

github-actions[bot] avatar Aug 28 '24 11:08 github-actions[bot]

Some additional context for my case:

  • We are using Kustomize and the helm chart inflator to install the helm charts. This effectively is applied as raw yaml manifests via argocd
  • We update both the helm charts for the controller and the runner-scale-set at the same time in the same PR

hawkesn avatar Dec 19 '24 21:12 hawkesn

Hey everyone,

Please follow the upgrade procedure we documented here. The upgrades right now are basically uninstall the current version and install the new version. Therefore, you cannot upgrade both resources simultaneously.

I will close this one since it is working as expected, but we are planning to fix the upgrade process in the future.

nikola-jokic avatar Mar 18 '25 12:03 nikola-jokic