actions-runner-controller icon indicating copy to clipboard operation
actions-runner-controller copied to clipboard

Metrics from acquired and available missing

Open zetaab opened this issue 2 years ago • 1 comments

Checks

  • [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
  • [X] I am using charts that are officially provided

Controller Version

0.7.0

Deployment Method

Helm

Checks

  • [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
  • [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes

To Reproduce

1. enable listener metrics
2. try to check acquired or available metrics from the metrics path

Describe the bug

https://github.com/actions/actions-runner-controller/blob/master/docs/adrs/2023-05-08-exposing-metrics.md?plain=1#L175 says that there should be available_jobs & acquired_jobs metrics exposed. However, for some reason these are disabled in code? https://github.com/actions/actions-runner-controller/blob/3e4201ac5f6c6d172a19b580154eaf5abf24a2ca/cmd/githubrunnerscalesetlistener/metrics.go#L54-L71

Describe the expected behavior

expected behaviour is that I could see values of these metrics in my prometheus.

Now I can see these only in pod logs, which makes it pretty difficult to make alerts based on metrics.

Additional Context

replicaCount: 2

flags:
  logLevel: "debug"

metrics:
  controllerManagerAddr: ":8080"
  listenerAddr: ":8080"
  listenerEndpoint: "/metrics"

Controller Logs

not relevant

Runner Pod Logs

not relevant

zetaab avatar Dec 14 '23 16:12 zetaab

I'm considering filing a separate issue but job_queue_duration_seconds is also missing and I'm not sure why. It is quite relevant to a situation we had where the Listener stops responding and jobs are no longer picked up. We don't currently have a good way of monitoring when jobs no longer are being picked up.

jameshounshell avatar Dec 21 '23 20:12 jameshounshell