Chains should expose some more detailed metrics
Feature request
Hello and thank you for Chains development! It would be great if Chains would expose some more detailed metrics besides Knative controller default metrics. Maybe something like "how many artefacts are there still to be processed".
Use case
I'm trying to measure performance of Chains. Mine approach is to generate as many sign-able artefacts as possible and measure difference between when pod that created artefacts finished and when "chains.tekton.dev/signed": "true" annotation appeared.
This is hard to do, because TaskRun yaml does not seem to contain any indication of when the annotation was added, so I have to iterate frequently to get that number.
If there is a metric that could be used to observe the count of remaining images to sign, that would be great.
@jhutar, the immediate alternative that I can think of is through some clever parsing of the controller logs.
Adding the suggested metric seems reasonable to me. I just have no idea how to do so.
@wlynch, @chitrangpatel, are you guys aware of any existing mechanism that could help @jhutar?
Could we add another annotation "chains.tekton.dev/signedAt": "YYYY-mm-dd HH:MM:SS"?
Another possible way:
The Taskrun status also has the following fields:
status:
completionTime: "2024-01-08T19:16:46Z"
conditions:
- lastTransitionTime: "2024-01-08T19:16:46Z"
message: All Steps have completed executing
reason: Succeeded
status: "True"
type: Succeeded
startTime: "2024-01-08T19:16:37Z"
I'm not sure if Chains could update the completionTime (or add field to the status) to when it signs the artifacts of that Task?
For tracking when an object was processed, +1 for using a Condition though we should create a new condition for chains, not reuse an existing pipelines condition. This would also give us a place to return debug information to address https://github.com/tektoncd/chains/issues/984. If you're looking to generate periodic reports with scripts, this would be the minimum needed.
If real-time metrics are needed, custom metrics can be added with otel. See https://github.com/tektoncd/pipeline/blob/main/pkg/taskrunmetrics/metrics.go for an example for how this is done in Pipelines.
My only concern with using a condition is having to grant access to modify TaskRunStatus resources to the ServiceAccount used to run the Chains controller.
We already have update permissions in order to add the current annotation. (and even though we have both the normal object + status permissions, idk what happens if you try to update the status via the CRD object - one may be a subset of the other anyway 🤔 ) https://github.com/tektoncd/chains/blob/48a5ded43c3453dec05cba114ecdda9e8307f2ae/config/100-rolebinding.yaml#L48-L55
We should probably look into reducing these - we don't need some of these update/delete permissions (but out of scope of this issue).
I think for telemetry as well as for performance testing, having otel metrics is sufficient. We can have a counter here.
I discussed this with @sudhishmk on how to do that. He will pick this up.
/assign @sudhishmk
@khrm: GitHub didn't allow me to assign the following users: sudhishmk.
Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide
In response to this:
I discussed this with @sudhishmk on how to do that. He will pick this up.
/assign @sudhishmk
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign
Related PR
I we can close this. @jhutar I think https://github.com/tektoncd/chains/pull/1034 resolves what you wanted.
Closing based on previous comment.