Standardize `externalParameters` for CI/CD builds
There is a desire to standardize the externalParameters for CI/CD buildTypes. Right now every CI/CD system defines its own buildType, but we're starting to see a common pattern. Every CI/CD system seems to have the following model, and it would be nice if they had a common schema so that consumers can handle them the same.
Strawman schema
Based on GitHub Actions and Google Cloud Build (GCB):
-
buildConfigSource/workflow= Reference to the top-level build configuration, for cases when the build platform resolved and fetched the build configuration from a source repository. Consists of (not necessarily separate fields):-
type= Type of source repository: git, hg, oci, etc. -
repository= URL of the source repository. -
ref/label/version= Label or reference within the repository to resolve to a specific artifact (commit, image, etc.). Could be a mutable label or an immutable digest—whichever the tenant specified. -
path/entryPoint/target= The file or target label within the resolved artifact to find the top-level build configuration.
-
-
buildConfig= Inlined top-level build configuration, for cases when it is provided directly by the caller. (Mutually exclusive withbuildConfigSource.) -
sourceToBuild= Source artifact to be initially checked out, for cases when it is an independent input (and different) frombuildConfigSource/buildConfig. This might be unique to Google Cloud Build; I'm unaware of other platforms that do this. But it is an important input (more important thanparameters) which is why I call it out. Consists of:-
repository -
ref/label/version
-
-
parameters= Additional independent parameters beyond those above. Examples:-
inputs(GitHub Actions) /params(Tekton) /substitutions(Google Cloud Build) = parameters provided by the user via some UI / CLI / API -
vars(GitHub Actions) = variables passed in via the repository or organization -
directory(Google Cloud Build) = initial working directory in which to start the build
-
Some design considerations
- How should we indicate that the provenance conforms to this common schema, while still indicating how to interpret (i.e. is it a GitHub Actions YAML vs Google Cloud Build YAML vs ...)?
- Separate
buildTypeper platform (status quo) and use duck typing to indicate that it fits the common schema. (Seem fragile?) - Separate
buildTypeper platform (status quo) + newbuildTypeCategory=cicd(we could define others in the future; field name and value TBD). I'm leaning towards this. - One common
buildType+ newbuildSubtypeper platform.
- Separate
- How many fields should we use for
buildConfigSource? We could merge some or all of the fields together into a single URI, but that comes with some trade-offs, including:- Extensibility
- Alignment with
resolvedDependencies - Ease of construction and parsing
- Ambiguity
- What field names should we use that make sense for a wide array of CI/CD systems?
- Does this fit most CI/CD systems well? We'll need to do a broad survey to make sure.
- Should
parametersbe standardized or type-specific? Maybe define standard names but allow other ones? - Where do we define this?
- Within the existing Provenance spec (my inclination)
- As a separate page under slsa.dev
- As a separate git repo
I support this proposal, especially if we do a broad survey to make sure it fits most CI/CD systems. I hope we can easily answer that question for Tekton and Jenkins, where some level of SLSA provenance generation is already available in Tekton Chains and the (sadly inactive) slsa-jenkins-generator.
Addressing some of the comments:
- IIUC
sourceToBuildalso fits Concourse CI. - I don't like the idea of duck-typing to indicate that provenance conforms to the common schema. Using
buildTypeCategoryas an interface definition feels like it would work. - On fields for
buildConfigSource, is there a reason we wouldn't want to useResourceDescriptor? - Field names could be informed by the broad survey of CI/CD systems, we can ask/assess whether the strawperson schema makes sense.
- I think we should define standard names for
parameters. I do like the idea of allowing others, but does that make it a separatebuildTypeCategory? - Where to define? I think a common schema/interface should live alongside the spec and current provenance schema, maybe as a separate page if that helps readability/layout but within the existing spec seems fine.
On fields for
buildConfigSource, is there a reason we wouldn't want to useResourceDescriptor?
I'm hesitant to allow most of those fields, particularly name, digest, annotations, and downloadLocation unless they actually come from the user. We had decided to split externalParameters from resolvedDependencies to make that distinction more clear. Otherwise it's unclear if, say, the user actually provided the digest or download location (and thus a policy needs to check it) vs it was what the build platform actually resolved to (and thus it's OK to ignore).
On the bright side, it might make some things more clean:
-
buildConfig→buildConfigSource.content(though we could always do this even without using ResourceDescriptor) -
buildType→buildConfigSource.mediaType(though I'm not sure that's a good fit)
We'd also have to split out path to a separate field, but that seems acceptable to me.
I support this proposal, but I think I'm missing some historical context. IIUC, buildConfigSource and sourceToBuild are roughly equivalent to configSource and materials from provenance v0.2. Why did we remove them from v1.0? Is it worth considering adding fields directly to the provenance spec rather than deal with buildSubType or buildCategory? It's hard to assess whether it's worth adding complexity by expanding the type system without understanding why the simple solution didn't work in the past.
IIUC,
buildConfigSourceandsourceToBuildare roughly equivalent toconfigSourceandmaterialsfrom provenance v0.2. Why did we remove them from v1.0?
Great question.
First, buildConfigSource is indeed equivalent to configSource in v0.2 (or definedInMaterial in v0.1), but sourceToBuild did not exist in earlier versions.
There were two major changes for v1.0:
-
Cleanly separate
externalParameters,internalParameters, andresolvedDependencies. Previously they were all mixed together:configSourcecontained both the actual external parameter (uri) as well as the resolveddigest, theenvironmentwas at the same level asconfigSourceandparameters, andmaterialswas kind of its own thing. This led to misunderstanding and a lack of clarity on how the provenance was expected to be consumed. -
Generically inform builders to record all
externalParametersrather than specificallyconfigSource(withentryPoint) andparameters. Across the board, almost no one interpreted v0.2 as intended.entryPointwas particularly confusing, while GCB had the concept ofsourceToBuildwhich didn't fit at all. Even GitHub Actions which nominally did fit OK was confused by the naming. To solve this, we simplified the model.
Between these two changes, things seemed to "click" for implementers. They seemed to understand it better and implement it with fewer mistakes.
The big difference between v0.2 configSource and the proposed buildConfigSource is:
- Do not include the resolved
digest. - Use better terminology that resonates with implementers and consumers, e.g.
repository+ref+pathinstead ofurl+entryPoint. - Better define what the model is and what it means. (This was lacking in v0.2.)
- Make it optional for builders that don't fit the CI/CD model.
Is it worth considering adding fields directly to the provenance spec rather than deal with
buildSubTypeorbuildCategory?
Yes, I think that is worth considering. However, the challenge is where to stick it without doing a major version bump. We need it to go in externalParameters, and v1 says that this is determined by the buildType. So our options are somewhat limited. I'm definitely open to more ideas though!
We could do a v2 with more invasive changes, but I don't think there's an appetite for that. My inclination is to stick to v1 and work around its quirks, and queue up a longer wish list before doing a v2.
I think this is a great idea!
The Securing Repos OpenSSF Working Group is encouraging all package registries to provide build provenance. Not all build properties have to be standardized, but having some that are consistently defined would make it easier for registries (like npm) to render information from multiple cloud CI/CD systems.
Overall I'm supportive of this. At least for GitHub actions it's been sometimes hard to understand what should go where even in v1.0 (externalParameters vs internalParameters though I think we've gotten it mostly right). We've also seen some very confused implementations so I think giving folks implementing SLSA a roadmap or guide for how to generate it with a semi-standard format is a good idea.
A couple of general comments:
-
Separate buildType per platform (status quo) + new buildTypeCategory = cicd (we could define others in the future; field name and value TBD). I'm leaning towards this.
While I understand the need for this; the cat's mostly out of the bag wrt
buildType, determining how to interpret theexternalParameterswas exactly whatbuildTypewas supposed to do. I wish we could come up with some kind of rather than adding yet-another-field we need to look at to understand how to interpret the provenance. I suppose that's the "One common buildType + new buildSubtype per platform" option. Other ideas:- maybe a
?_type=cicdquery parameter type of thing?). - Some specs have a special field that indicates a type. Maybe something like
_buildTypeCategory: cicdinside theexternalParametersthat implementations can look for?
I realize I'm grasping at straws a bit but I wish we could have something a bit cleaner.
- maybe a
-
There seems to be a lot of fields that are used depending on the provider and some that seem fairly provider specific.
- ref / label / version
- path / entryPoint / target
- inputs, vars (GitHub Actions) / params (Tekton) / substitutions (Google Cloud Build)
Are just certain fields (like
parametersorsourceToBuild) going to be provider specific? Just looking at this it feels like we have defined some "common" fields but actual provenance generated in practice will look very different for each provider so I wonder if we've actually made our lives easier. I can see @steiza's point that UI's could be implemented easier, but I'm not sure this actually makesslsa-verifier's life that much easier.
Maybe I'd just like to see some full examples of what each provider's provenance would look like (at least the ones we know about) before fully endorsing this (I could maybe do a GHA one if we want to split up the work). Maybe if we had a proposals/ or experimental/ directory or something we could make some PRs that could be reviewed in there without actually committing to spec or needing a separate repo.
/cc @laurentsimon
Yes, I think we need real-world examples before deciding on everything.
We do have https://github.com/slsa-framework/slsa-proposals. What about creating a proposal there? That would allow us to iterate on the design and add example files.
Sent out https://github.com/slsa-framework/slsa-proposals/pull/16 that is just a copy of the first comment. @ianlewis would that help us iterate?
Sent out slsa-framework/slsa-proposals#16 that is just a copy of the first comment. @ianlewis would that help us iterate?
Yeah, I think so. Thanks.
This is great proposal! I believe this generic schema will make both provenance producers and consumers life much easier.
I have a question about the platform-specific variables. They are normally referenced directly in the build config, but its values are not specified in the build config. One good example is GitHub context variables. Users can reference those variables directly inside their config (example), and GitHub will replace those variables with actual value while executing the workflow under the hood.
How should we capture those variables in the provenance? Two options are coming to my mind:
- Option 1: Only capture the user-provided raw build config in the
externalParametersthat will only contain the reference of those context variable, but miss the actual value of those context variables. AndinternalParameterscaptures those context variables and its value.- One downside is that
internalParametersis for debugging purpose only per slsa spec. What if people want to verify those context variables later on?
- One downside is that
- Option 2:
externalParameterscaptures the "resolved" build config in which the context variables are replaced with the actual value.- Con 1: This does not work for the remote build config case b/c
buildConfigSourceandbuildConfigare mutually exclusive. - Con 2: If we put resolved config into externalParameters, we lose some information about the raw user input.
- Con 1: This does not work for the remote build config case b/c
Looking for more thoughts and feedback. Thanks!! cc @lbernick
buildConfigSource.ref / label / version Could be a mutable label or an immutable digest—whichever the tenant specified.
I am wondering if a mutable revision here will be a concern for the downstream consumers of the provenance. cc @AdamZWu
How should we capture those variables in the provenance? Two options are coming to my mind:
Perhaps I'm misunderstanding the question, but the provenance spec says:
-
externalParameters= all variables provided by an external entity / user -
internalParameters= all variables provided by the build platform itself, represented bybuilder.id
So on GitHub Actions, for example, externalParameters.parameters contains vars and inputs (the two types of user-defined variables), while internalParameters contains other things needed for reproducibility, e.g. github.event_name.
Does that help?
buildConfigSource.ref / label / version Could be a mutable label or an immutable digest—whichever the tenant specified.
I am wondering if a mutable revision here will be a concern for the downstream consumers of the provenance. cc @AdamZWu
The idea is that the externalParameters record precisely what the parameters were, while resolvedDependencies optionally records what those parameters resolved to. So if a build requested mutable label "main", and that resolved to hash abcd1234, then "main" should go in externalParameters and "abcd1234" should go in resolvedDependencies. The idea is that the externalParameters are the things under direct attacker control and thus SHOULD be validated against expectations. Whether to do validation of resolved dependencies is less clear cut.
From @MarkLodato
Should
parametersbe standardized or type-specific? Maybe define standard names but allow other ones?
From @joshuagl
I think we should define standard names for
parameters. I do like the idea of allowing others, but does that make it a separatebuildTypeCategory?
From @ianlewis
Are just certain fields (like
parametersorsourceToBuild) going to be provider specific? Just looking at this it feels like we have defined some "common" fields but actual provenance generated in practice will look very different for each provider so I wonder if we've actually made our lives easier.
I don't know how much benefit we would get from standardizing the parameter names as they will be specific to a particular CI system. Artifacts built from common CI/build systems should have common parameter names and these can be documented by those systems. I see the benefit of parameters as an available envelope for platforms to consistently place additional parameters which would be needed for re-triggering a build. These would likely not be critical for third parties ingesting the provenance.
While this is not as much of a concert for some systems (GitHub actions? GCB?), with Tekton, the parameters can affect the build by enabling/disabling certain Tasks and/or by changing the behavior of Tasks. For example, if a Tekton Pipeline is configured such that a parameter can enable/disable hermetic builds without changing the builder.id, then this would have a large effect on the overall SLSA levels (separate issue on this). These types of parameters might benefit from some form of "standardization" due to their ultimate effect.