Artifacts are created multiple times
Expected Behavior
Tekton Chains should only ever create one instance of an artifact type for a given TaskRun resource.
Actual Behavior
Tekton Chains creates the same artifact multiple times for any given TaskRun. So far, I've been able to verify that this occurs to OCI image signatures and transparency logs uploaded to Rekor.
Steps to Reproduce the Problem
The steps below are meant to be used on a "dedicated" cluster where no other TaskRun resources are being created.
- Create a TaskRun resource, either directly or via a PipelineRun.
- Wait for the TaskRun to complete successfully.
- Inspect the logs of the
tekton-chains-controller. Notice how there are more than one log messages about uploading to the transparency log. In my case, I see two entries:
2022-03-23T18:35:27.682Z info Uploaded entry to https://rekor.sigstore.dev with index 1775727
2022-03-23T18:35:28.094Z info Uploaded entry to https://rekor.sigstore.dev with index 1775728
- Inspect the logs of the
tekton-chains-controller. Notice how there are more than one log messages about uploading the image signature to the registry. In my case, I see two entries:
2022-03-23T19:05:27.149Z info Successfully uploaded signature for quay.io/lucarval/tekton-test@sha256:73ee4bff9c6b74cd09948e29e578b807c56eaf17257f21706053eb7619f58356
2022-03-23T19:05:29.318Z info Successfully uploaded signature for quay.io/lucarval/tekton-test@sha256:73ee4bff9c6b74cd09948e29e578b807c56eaf17257f21706053eb7619f58356
The repo usage logs in quay.io also report that the signature tag was first created, then immediately updated to point to a different image signature.
Additional Info
Most of the payload information is the same between the two example entries mentioned above:
$ diff <(curl -s 'https://rekor.sigstore.dev/api/v1/log/entries?logIndex=1775727' | jq '.') <(curl -s 'https://rekor.sigstore.dev/api/v1/log/entries?logIndex=1775728' | jq '.')
2c2
< "79e45e89b83634776c7a4adad3ff84da1f0ebefd07133410b575772d69d58075": {
---
> "0677b20f548c264c181ecfbbe51f04ee5fda9b3b3037a6ad606813fbb5c2dfa5": {
6c6
< "body": "eyJhcGlWZXJzaW9uIjoiMC4wLjEiLCJraW5kIjoiaW50b3RvIiwic3BlYyI6eyJjb250ZW50Ijp7Imhhc2giOnsiYWxnb3JpdGhtIjoic2hhMjU2IiwidmFsdWUiOiIxZGZlNzgwMTk2YzgxM2Q1NzQ3YWFmMWRkZDk4N2I1NGE4NTlhNGRlMzlmMmY5Mjc4MTc3NzIxM2U5NzNmMzkxIn19LCJwdWJsaWNLZXkiOiJMUzB0TFMxQ1JVZEpUaUJRVlVKTVNVTWdTMFZaTFMwdExTMEtUVVpyZDBWM1dVaExiMXBKZW1vd1EwRlJXVWxMYjFwSmVtb3dSRUZSWTBSUlowRkZSWFpIYWtVNWNUWkJZVW8yVFhCTGFUZHNXVUZuY1hwTllrdDZNZ3BNY1hsRE5VMUhNR3haZVVkUlkwWkZXVEpHUzIxM1RWcFFZVGd5Vm5SVWFtaHFRbTF4VUdwWWFIazFlVWQzV2xnck1XNTJVbnBJTkhSQlBUMEtMUzB0TFMxRlRrUWdVRlZDVEVsRElFdEZXUzB0TFMwdENnPT0ifX0=",
---
> "body": "eyJhcGlWZXJzaW9uIjoiMC4wLjEiLCJraW5kIjoiaW50b3RvIiwic3BlYyI6eyJjb250ZW50Ijp7Imhhc2giOnsiYWxnb3JpdGhtIjoic2hhMjU2IiwidmFsdWUiOiJkYzM0OWZjZTYzNzg2ZTIyZWM0YjM2YmIyNGUxMzQwNDIwZmZhMGFjMzNjNTQ1NDRiNTA4Y2U0MjQ0NGU5YmJmIn19LCJwdWJsaWNLZXkiOiJMUzB0TFMxQ1JVZEpUaUJRVlVKTVNVTWdTMFZaTFMwdExTMEtUVVpyZDBWM1dVaExiMXBKZW1vd1EwRlJXVWxMYjFwSmVtb3dSRUZSWTBSUlowRkZSWFpIYWtVNWNUWkJZVW8yVFhCTGFUZHNXVUZuY1hwTllrdDZNZ3BNY1hsRE5VMUhNR3haZVVkUlkwWkZXVEpHUzIxM1RWcFFZVGd5Vm5SVWFtaHFRbTF4VUdwWWFIazFlVWQzV2xnck1XNTJVbnBJTkhSQlBUMEtMUzB0TFMxRlRrUWdVRlZDVEVsRElFdEZXUzB0TFMwdENnPT0ifX0=",
9c9
< "logIndex": 1775727,
---
> "logIndex": 1775728,
13,17c13,17
< "4979bda9afddf14ec64086e1656bda947e630c6a3d4346ce44ddcfbcb300428d",
< "a53e860326fc625e1224d9f609c11828083dd61e76e03ae3a9ea4daf06088cce",
< "b38d9b3af81f70c0d52e0f4468d8b3f5ba729b8e3f47a31d2e60facbe2e4ccc1",
< "6a0d8314142892b27e0a297452f820b0370502530b39d39b24e44ce02278781a",
< "f70d6ba20245a731631fdda969e7e1cd48803fe6767ca543fabcce5e114e9124",
---
> "10b5c420b9ed5702affce91c0d1d2868087e70e7d9693422d69b7110853b1129",
> "9afe194d71a7206bd0e72d894e56629aeb6c0511f8478941a0d97dc43102defb",
> "7d6c25c2c6940fdb5bba86743942ae136e2d8d1287bc70e99e6bc5474e1235a0",
> "94cb9d2268a61ba4908f10cd9df8272d46fbbc03fb15582c6943504264a4fd5c",
> "82467d2986af5ea22c3aef508c7c447161be2e410471bb09a32d431390489122",
29c29
< "logIndex": 1775727,
---
> "logIndex": 1775728,
33c33
< "signedEntryTimestamp": "MEUCIFycoCGa94fIfh1swAXXsyJCDv7LILWg+8M941teAllZAiEAq94HRyO7o/9TuE7VbOU+ZIBV5E/xS7A49QIyhR39blc="
---
> "signedEntryTimestamp": "MEUCIQCdMnhWp/2YnUs7YZt8+1j+MDt2D74/UWDtf5mQMyWPIAIgKJilxSumjX1szEsxlajhCUORrpRDYqNrKLGWKekM2ho="
A closer look at the attestation.body attribute shows that the only difference is the hash value:
$ diff <(curl -s 'https://rekor.sigstore.dev/api/v1/log/entries?logIndex=1775727' | jq '.[].body | @base64d | fromjson') <(curl -s 'https://rekor.sigstore.dev/api/v1/log/entries?logIndex=1775728' | jq '.[].body | @base64d | fromjson')
8c8
< "value": "1dfe780196c813d5747aaf1ddd987b54a859a4de39f2f92781777213e973f391"
---
> "value": "dc349fce63786e22ec4b36bb24e1340420ffa0ac33c54544b508ce42444e9bbf"
- Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.0", GitCommit:"cb303e613a121a29364f75cc67d3d580833a7479", GitTreeState:"archive", BuildDate:"2021-07-22T00:00:00Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3+b93fd35", GitCommit:"3a0f2c90b43e6cffd07f57b5b78dd9f083e47ee2", GitTreeState:"clean", BuildDate:"2022-02-11T05:26:59Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
This is an OpenShift cluster:
$ oc version
Client Version: 4.8.0-202201210133.p0.g88e7eba.assembly.stream-88e7eba
Server Version: 4.9.23
Kubernetes Version: v1.22.3+b93fd35
- Tekton Pipeline version:
$ tkn version
Client version: 0.21.0
Pipeline version: v0.28.3
Triggers version: v0.16.1
One possible explanation for this behavior is that while the controller is reconciling a given version (A) of the TaskRun, a new version (B) of the TaskRun is created and added to the reconciliation queue. Once the controller is done with A, it picks up B which contains stale information causing the controller to re-do the work already done for A.
I tried the reproducer steps with transparency.enabled set to "true" and to "manual" (with the corresponding changes to make it work). In either case, I see the same exact behavior. I initially thought this was only seen when uploading transparency logs and thought this may not be working as expected. But that does look right at least for "manual" mode. The stale resource theory explains this. The check will never work because the controller is checking a stale version of the TaskRun resource which does not have the annotation.
There's another possibility here. I found an article that states a controller should stick to One Custom Resource Modification at a Time. Digging through the code, it looks like in most cases, the TaskRun resource is only modified once. However! If the artifacts.taskrun.storage is set to tekton, it is also modified here.
Sure enough, if I change artifacts.taskrun.storage to oci, the issue goes away completely. I do see two entries in rekor, but that's because one is for the image signature and the other is for the attestation. This is ok.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle stale
Send feedback to tektoncd/plumbing.
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/lifecycle rotten
Send feedback to tektoncd/plumbing.
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.
/close
Send feedback to tektoncd/plumbing.
@tekton-robot: Closing this issue.
In response to this:
Rotten issues close after 30d of inactivity. Reopen the issue with
/reopenwith a justification. Mark the issue as fresh with/remove-lifecycle rottenwith a justification. If this issue should be exempted, mark the issue as frozen with/lifecycle frozenwith a justification./close
Send feedback to tektoncd/plumbing.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.