Scale Set Metrics - Missing Repo Name and Workflow Name
Checks
- [X] I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- [X] I am using charts that are officially provided
Controller Version
0.7.0
Deployment Method
Helm
Checks
- [X] This isn't a question or user support case (For Q&A and community support, go to Discussions).
- [X] I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
1. Install both ARC and the ScaleSet helm charts to a kubernetes cluster.
2. Run a job with a self hosted runner.
Describe the bug
With the old self-hosted runners before the Scale Set, prometheus metrics like github_workflow_job_failures_total had repository/repository_full_name/workflow_name/job_name
Example
github_workflow_job_failures_total{container="actions-metrics-server", endpoint="metrics-port", exit_code="1", failed_step="10", instance="10.26.153.77:8080", job="actions-runner-controller-actions-metrics-server", job_name="client-codeql / client-codeql", namespace="github-actions", owner="orgname", pod="actions-runner-controller-actions-metrics-server-598bb5fb99lhl2", repository="reponame", repository_full_name="orgname/reponame", runs_on="self-hosted", service="actions-runner-controller-actions-metrics-server", workflow_name="build"}
I found this very helpful because I could monitor for when jobs began to fail or repos where the developers were running very long/costly ci.
Now with the new metrics we only have job_workflow_ref and job_name. job_workflow_ref only points to the org/repo/relative_path of the workflow, this means with "reuseable workflows" it only shows where the reusable workflow is (in our case a shared CI repo), not the dev's repo which called it.
Example:
gha_completed_jobs_total{container="autoscaler", endpoint="http-metrics", event_name="pull_request", instance="10.26.139.124:8080", job="self-hosted-listener", job_name=" / Build Test Publish", job_result="canceled", job_workflow_ref="myorg/github-actions-shared/.github/workflows/python-library.yaml@refs/heads/v1", namespace="github-actions", organization="myorg", pod="self-hosted-56cc585c-listener", service="self-hosted-listener"}
Describe the expected behavior
I expect in the prometheus metrics for the new Scale Set Listener for there to be the labels workflow_name and repository in addition to job_workflow_ref. As those labels existed in the prior self hosted runner paradigm.
I know that this will increase metric cardinality but it did not seem to be a problem when we ran the old self hosted runners.
I know we have a very unique case where we take advantage of the reusable workflows feature. Otherwise it makes it almost impossible to track what repos are initiating the workflows/jobs.
Thanks in advance 🙏
Additional Context
Not Relevant
Controller Logs
Not Relevant
Runner Pod Logs
Not Relevant
Hello! Thank you for filing an issue.
The maintainers will triage your issue shortly.
In the meantime, please take a look at the troubleshooting guide for bug reports.
If this is a feature request, please review our contribution guidelines.