runner Self-hosted runner stuck on "Waiting for a runner to pick up this job..." in multi-step jobs

Describe the bug Using a self-hosted runner. When a GitHub action has multiple steps the first will run successfully but then the subsequent step is stuck in queued status with a logging message "Waiting for a runner to pick up this job..."

I am able to get the stuck jobs to start by either cancelling and re-running the job via the GitHub UI or by restarting the GitHub Runner service within our EC2. In both instances the job immediately picks up and runs successfully.

To Reproduce Steps to reproduce the behavior:

Use self-hosted GitHub Runner
Use a multi-step action
Initiate the action
Observe that the 2nd step is stuck in queued status.

Expected behavior The runner should pick up the 2nd (and following steps)

Runner Version and Platform

v2.321.0 Windowsx64

We noticed that our machine auto-updated with this version on November 27 and then our CI runs the following week started to have this problem.

OS of the machine running the runner? Windows

What's not working?

"Waiting for a runner to pick up this job..." for up to hours.

Job Log Output

N/A the runner does not get to job output, it is stuck in queue.

Dec 04 '24 18:12 joelnwalkley

I am also being affected by this, tried everything, added disableUpdate to no avail

Dec 04 '24 18:12 daniel-v-nanok

Same here. Manually stopping and starting the service makes it move onto the next job. Obviously this is not ideal.

v2.321.0 Linux Arm64

Dec 04 '24 19:12 richardgetz

Also affected by this issue on multiple runners running v2.321.0 on Debian 12 / amd64. We are able to workaround this issue by rebooting the runners.

Dec 04 '24 21:12 TylerWilliamson

I tested downgrading to v2.320.0 and encountered the same issue.

Dec 04 '24 22:12 richardgetz

Same. I tried everything that I can. Nothing changed.

Dec 05 '24 03:12 canmustu

Same here. is there any work around for this issue?

Dec 05 '24 03:12 amalhanaja

I've tried almost everything I could do... but it's not working out well.

Dec 05 '24 04:12 connor-27

I'm also facing the same issue, have to stop the service and start again then only the next stage starts

Dec 05 '24 06:12 rohitkhatri

it's not working out well.

Dec 05 '24 06:12 AkshayDhadwal26

I'm also facing the same issue, have to stop the service and start again then only the next stage starts

Dec 05 '24 10:12 HoaiTTT-Kozocom

Windows and Linux both have same issue.

Dec 05 '24 10:12 kemalgoekhan

Same issue here, and my workaround is to rerun & immediately cancel some old action. This "revives" stuck jobs, but new jobs endup with the same problem again.

Dec 05 '24 10:12 enginelesscc

same issue only fixed by restart runner each step :( systemctl restart actions.runner......

Dec 05 '24 12:12 dexpert

For those using action machulav/ec2-github-runner, which does not use systemd service you need to kill process /actions-runner/run-helper.sh and start it again from /actions-runner/bin with ./run-helper.sh run.

Dec 05 '24 12:12 NekiHrvoje

Same issue here. Tried reinstalling and downgrading the runner without success.

Dec 05 '24 13:12 netaviator

Could you provide run url which got stuck waiting for runner? That'll help us debug.

Dec 05 '24 14:12 lokesh755

Could you provide run url which got stuck waiting for runner? That'll help us debug.

Unfortunately this is on a private repo; is there another way I can get you additional information? I could possibly inquire with my organization about giving you temporary read-only access.

Dec 05 '24 14:12 joelnwalkley

Unfortunately this is on a private repo; is there another way I can get you additional information? I could possibly inquire with my organization about giving you temporary read-only access.

Private repo run url is fine too.

Dec 05 '24 14:12 lokesh755

I'm having the same problem, this run got stuck for whole day: https://github.com/Dasharo/meta-dts/actions/runs/12163220462. Had to restart runner so workflow would continue. Weirdly this one https://github.com/Dasharo/meta-dts/actions/runs/12180581391 got stuck waiting on Run DTS tests but cleanup started normally after job failed.

Dec 05 '24 14:12 m-iwanicki

We've been seeing the same thing here for 3 days now. First 2 jobs run, then waiting...

We are seeing this in multiple repositories, though some of them work fine. Nothing in our workflows has changed in 6 months.

Ubuntu 22.04 EC2 instances in AWS. using: machulav/ec2-github-runner@v2 also tried: machulav/[email protected] but no changes to issue.

Dec 05 '24 14:12 jgrasett

@lokesh755 same here for self-hosted private repos.

Edit: Run url has been deleted from this post.

Dec 05 '24 15:12 canmustu

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Dec 05 '24 15:12 lokesh755

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Perfect. Thank you. It works healthy for now.

Dec 05 '24 15:12 canmustu

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Has anyone tried it and got it working normally again? Because I have done a workaround by changing multiple jobs to a single job in my workflow

Dec 05 '24 22:12 Meigara-Juma

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Has anyone tried it and got it working normally again? Because I have done a workaround by changing multiple jobs to a single job in my workflow

It is fixed on my workflows. No issue at all after @lokesh755 's last message for me.

Dec 05 '24 22:12 canmustu

Can confirm! 👍🏿

Dec 06 '24 05:12 netaviator

I'm still facing this sometimes

Dec 06 '24 06:12 AkshayDhadwal26

Fixed for me as well

@SoftradixAD can you try stopping and starting the service again, that worked for me

Dec 06 '24 06:12 rohitkhatri

Okay let me try, @rohitkhatri sir

Dec 06 '24 06:12 AkshayDhadwal26

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Thank you, it is working correctly

Dec 08 '24 08:12 daniel-v-nanok