Provenance: add optional URI and digest for build logs
Currently, the spec for predicate metadata includes:
metadata.buildInvocationId string, optional
Identifies this particular build invocation, which can be useful for finding associated logs or other ad-hoc analysis. The exact meaning and format is defined by builder.id; by default it is treated as opaque and case-sensitive. The value SHOULD be globally unique.
Seems like logs are useful not just for debugging, but also for verifying the actions laid out elsewhere in the attestation were actually those taken. I'd like to suggest that logs get special treatment in the metadata section such that they include a URI and a digest similar to other referenced objects, e.g.:
"logs": [
{
"uri": "scheme://some/path",
"digest": {
"sha256": "eb158a15263554e65cab9dfd6f2a640b434d17e7e6af8f40435707662f88234a"
}
}
]
Great discussion topic. AIUI the provenance specification does attempt to ensure that there's enough information captured to verify the actions taken. This might be implicitly through invocation.configSource and the repository it describes or explicitly through buildConfig.
Do you think that logs would provide additional information for verification on top of the above? Would this verification be more about being able to check the build service executed the invocation as expected?
Aside: one thing the spec does allow for, which is easy to miss, is extensibility. Both by having several "arbitrary JSON object with a schema defined by buildType." (invocation.parameters, invocation.environment and buildConfig) and through the parsing rules explicitly defining how to add extension fields:
Producers MAY add extension fields using field names that are URIs.
I believe the goal is to aid in debugging and auditing. I've heard from others that it is valuable to link to "evidence" of the attestation, where logs would be perhaps the most common form of evidence. This is similar to the original in-toto link format's byproducts: other outputs that are not the main output of the build.
I'm open adding it as an optional field.
Other alternatives to adding it to the provenance predicate:
- Add it to the statement (most likely as
evidenceor similar). - Generalize the notion of relationships: https://github.com/in-toto/attestation/issues/6.
Do you think that logs would provide additional information for verification on top of the above? Would this verification be more about being able to check the build service executed the invocation as expected?
Exactly. This is useful when doing either a preventative or diagnostic forensic investigation of the claims made in the attestation. It's even more useful if the logs are non-falsifiable hence the suggestion of adding a digest for them.
one thing the spec does allow for, which is easy to miss, is extensibility
The metadata field seems not to be one of the explicitly extensible fields.
Producers MAY add extension fields using field names that are URIs.
I'd really like this to be a field that contains both the URI and DigestSet. Maybe we need another field type to describe external resources like this which contain both a URI and an optional DigestSet given that there are a number of fields that already fall into this category (materials, subject(?), configSource) and to which it might also be applicable (logs, policies in VSAs etc.)
Add it to the statement (most likely as evidence or similar).
I really like this idea but I think it needs more discussion, especially regarding what else should be included in this field and how it relates to the intent of the other content of the metadata field
Thanks both for articulating the need. Thinking of some of the other predicate types we've observed in the wild (see https://github.com/in-toto/attestation/issues/98) I can picture this being in the statement layer.
For example, a vulnerability attestation might include only a high-level reporting with the evidence linking to the full scan results.
I've filed https://github.com/in-toto/attestation/issues/114 against the in-toto/attestation repo to discuss the idea of including an evidence field in the statement.
This is now present as the byproducts field.