runner icon indicating copy to clipboard operation
runner copied to clipboard

Self-hosted runner stuck on "Waiting for a runner to pick up this job..." in multi-step jobs

Open joelnwalkley opened this issue 1 year ago • 110 comments

Describe the bug Using a self-hosted runner. When a GitHub action has multiple steps the first will run successfully but then the subsequent step is stuck in queued status with a logging message "Waiting for a runner to pick up this job..."

I am able to get the stuck jobs to start by either cancelling and re-running the job via the GitHub UI or by restarting the GitHub Runner service within our EC2. In both instances the job immediately picks up and runs successfully.

To Reproduce Steps to reproduce the behavior:

  1. Use self-hosted GitHub Runner
  2. Use a multi-step action
  3. Initiate the action
  4. Observe that the 2nd step is stuck in queued status.

Expected behavior The runner should pick up the 2nd (and following steps)

Runner Version and Platform

v2.321.0 Windowsx64

We noticed that our machine auto-updated with this version on November 27 and then our CI runs the following week started to have this problem.

OS of the machine running the runner? Windows

What's not working?

"Waiting for a runner to pick up this job..." for up to hours.

Job Log Output

N/A the runner does not get to job output, it is stuck in queue.

joelnwalkley avatar Dec 04 '24 18:12 joelnwalkley

I am also being affected by this, tried everything, added disableUpdate to no avail

daniel-v-nanok avatar Dec 04 '24 18:12 daniel-v-nanok

Same here. Manually stopping and starting the service makes it move onto the next job. Obviously this is not ideal.

v2.321.0 Linux Arm64

richardgetz avatar Dec 04 '24 19:12 richardgetz

Also affected by this issue on multiple runners running v2.321.0 on Debian 12 / amd64. We are able to workaround this issue by rebooting the runners.

TylerWilliamson avatar Dec 04 '24 21:12 TylerWilliamson

I tested downgrading to v2.320.0 and encountered the same issue.

richardgetz avatar Dec 04 '24 22:12 richardgetz

Same. I tried everything that I can. Nothing changed.

canmustu avatar Dec 05 '24 03:12 canmustu

Same here. is there any work around for this issue?

amalhanaja avatar Dec 05 '24 03:12 amalhanaja

I've tried almost everything I could do... but it's not working out well.

connor-27 avatar Dec 05 '24 04:12 connor-27

I'm also facing the same issue, have to stop the service and start again then only the next stage starts

rohitkhatri avatar Dec 05 '24 06:12 rohitkhatri

it's not working out well.

AkshayDhadwal26 avatar Dec 05 '24 06:12 AkshayDhadwal26

I'm also facing the same issue, have to stop the service and start again then only the next stage starts

HoaiTTT-Kozocom avatar Dec 05 '24 10:12 HoaiTTT-Kozocom

Windows and Linux both have same issue.

kemalgoekhan avatar Dec 05 '24 10:12 kemalgoekhan

Same issue here, and my workaround is to rerun & immediately cancel some old action. This "revives" stuck jobs, but new jobs endup with the same problem again.

enginelesscc avatar Dec 05 '24 10:12 enginelesscc

same issue only fixed by restart runner each step :( systemctl restart actions.runner......

dexpert avatar Dec 05 '24 12:12 dexpert

For those using action machulav/ec2-github-runner, which does not use systemd service you need to kill process /actions-runner/run-helper.sh and start it again from /actions-runner/bin with ./run-helper.sh run.

NekiHrvoje avatar Dec 05 '24 12:12 NekiHrvoje

Same issue here. Tried reinstalling and downgrading the runner without success.

netaviator avatar Dec 05 '24 13:12 netaviator

Could you provide run url which got stuck waiting for runner? That'll help us debug.

lokesh755 avatar Dec 05 '24 14:12 lokesh755

Could you provide run url which got stuck waiting for runner? That'll help us debug.

Unfortunately this is on a private repo; is there another way I can get you additional information? I could possibly inquire with my organization about giving you temporary read-only access.

joelnwalkley avatar Dec 05 '24 14:12 joelnwalkley

Unfortunately this is on a private repo; is there another way I can get you additional information? I could possibly inquire with my organization about giving you temporary read-only access.

Private repo run url is fine too.

lokesh755 avatar Dec 05 '24 14:12 lokesh755

I'm having the same problem, this run got stuck for whole day: https://github.com/Dasharo/meta-dts/actions/runs/12163220462. Had to restart runner so workflow would continue. Weirdly this one https://github.com/Dasharo/meta-dts/actions/runs/12180581391 got stuck waiting on Run DTS tests but cleanup started normally after job failed.

m-iwanicki avatar Dec 05 '24 14:12 m-iwanicki

We've been seeing the same thing here for 3 days now. First 2 jobs run, then waiting...

We are seeing this in multiple repositories, though some of them work fine. Nothing in our workflows has changed in 6 months.

Ubuntu 22.04 EC2 instances in AWS. using: machulav/ec2-github-runner@v2 also tried: machulav/[email protected] but no changes to issue.

jgrasett avatar Dec 05 '24 14:12 jgrasett

@lokesh755 same here for self-hosted private repos.

Edit: Run url has been deleted from this post.

canmustu avatar Dec 05 '24 15:12 canmustu

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

lokesh755 avatar Dec 05 '24 15:12 lokesh755

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Perfect. Thank you. It works healthy for now.

canmustu avatar Dec 05 '24 15:12 canmustu

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Has anyone tried it and got it working normally again? Because I have done a workaround by changing multiple jobs to a single job in my workflow

Meigara-Juma avatar Dec 05 '24 22:12 Meigara-Juma

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Has anyone tried it and got it working normally again? Because I have done a workaround by changing multiple jobs to a single job in my workflow

It is fixed on my workflows. No issue at all after @lokesh755 's last message for me.

canmustu avatar Dec 05 '24 22:12 canmustu

Can confirm! 👍🏿

netaviator avatar Dec 06 '24 05:12 netaviator

I'm still facing this sometimes

AkshayDhadwal26 avatar Dec 06 '24 06:12 AkshayDhadwal26

Fixed for me as well

@SoftradixAD can you try stopping and starting the service again, that worked for me

rohitkhatri avatar Dec 06 '24 06:12 rohitkhatri

Okay let me try, @rohitkhatri sir

AkshayDhadwal26 avatar Dec 06 '24 06:12 AkshayDhadwal26

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Thank you, it is working correctly

daniel-v-nanok avatar Dec 08 '24 08:12 daniel-v-nanok