RFC: cargo-sbom
This RFC adds an option to Cargo that emits a Software Bill of Materials (SBOM) alongside compiled artifacts. Similar to how Cargo emits split debug info or "dep-info" (.d) files, this change emits an SBOM in a Cargo-specific format alongside outputs in the target directory. External tooling or Cargo subcommands can consume this Cargo SBOM file and transform it into other SBOM formats such as SPDX or CycloneDX.
Originally posted on internals as a pre-RFC, now moved to an RFC.
I am wondering what this is worth at all. If there is the slightest chance to fake this, it's less meaningful. E.g. build.rs calls cargo differently and writes the file itself. Or a test alters it, after cargo build finishes. Detectable by code review, of course. But if downstream trusts the SBOM, that might take time to be discovered.
Say there's an exploitable dependency and it takes a newer version to fix that. But an artifact wants to provide that exploitability while pretending not to have it. Easy, if it fakes the SBOM to declare the newer version, while Cargo.toml uses the older one.
OTOH, if I am wrong, and SBOM were ok: Features matter! The used crates can be vulnerable or not, depending on which features are activated.
I am wondering what this is worth at all. If there is the slightest chance to fake this, it's less meaningful. E.g.
build.rscalls cargo differently and writes the file itself. Or a test alters it, aftercargo buildfinishes. Detectable by code review, of course. But if downstream trusts the SBOM, that might take time to be discovered.Say there's an exploitable dependency and it takes a newer version to fix that. But an artifact wants to provide that exploitability while pretending not to have it. Easy, if it fakes the SBOM to declare the newer version, while Cargo.toml uses the older one.
OTOH, if I am wrong, and SBOM were ok: Features matter! The used crates can be vulnerable or not, depending on which features are activated.
At some point, you have to trust something. You also have to deal with the chain of trust after this file is written.
For tests, I at least assume people are most likely to read this file in dedicated "production build" jobs which, in my experience, do not run tests.
For build.rs, this file is written at least after all are run from your dependencies. I'm not too sure if it will be written before your own build.rs but, if you can't trust that, then you can't trust people working with the file after its written.
For
build.rs, this file is written at least after all are run from your dependencies. I'm not too sure if it will be written before your ownbuild.rsbut, if you can't trust that, then you can't trust people working with the file after its written.
The question is, whether this is only for validating one’s own artifacts’ dependencies by the like of Blackduck. Then it might be helpful by giving developers and maintainers a vulnerabilities overview. Which still leaves room for “the enemy within” attacks.
Or is it to accompany binaries on the internet, to prove something about them? How could it?
Either way, a chain of proof is hard to establish. If the SBOM production doesn’t have hard guarantees, it’s the weak link…
If what you are looking for is a guarantee that a generated dependency list is the one generated for the binary, from inception to your system, then this is not that feature. #2801 isn't even that feature (atm).
I am wondering what this is worth at all. If there is the slightest chance to fake this, it's less meaningful. E.g.
build.rscalls cargo differently and writes the file itself. Or a test alters it, aftercargo buildfinishes. Detectable by code review, of course. But if downstream trusts the SBOM, that might take time to be discovered.
This RFC is intended to expose accurate dependency information for other tools to consume. It's not intended to guard against malicious crates or build scripts. SBOMs are only part of the solution to software supply chain security.
Is the premise here that cargo metadata or similar does not give enough information for this to be implemented fully as a crate (such as for cargo-bom)? Or that it should be maintained by the rust project itself?
It just seems like something that should be an installable cargo command (like cargo install cargo-sbom which would give cargo-sbom).
Is the premise here that
cargo metadataor similar does not give enough information for this to be implemented fully as a crate (such as forcargo-bom)? Or that it should be maintained by the rust project itself?It just seems like something that should be an installable cargo command (like
cargo install cargo-sbomwhich would givecargo-sbom).
I feel like this is covered in Alternatives, particularly:
Unfortunately it's difficult to extract accurate SBOM information with existing options. Using the Cargo.lock file or cargo metadata overincludes dependencies. Additionally, since Cargo has many different commands that produce compiled artifacts (build, test, bench, etc.) and each of these commands take arguments that can affect the dependency list it's difficult to ensure that the correct dependency list is used.
This is basically a dump of cargo's unit graph at the end of the build so other people can build their own tools on top of this. There is no other way to get information like this at this time.
This also opens the door for build.rs to inject data so information about non-Rust dependencies can be included. There is currently no mechanism for other tools to collect side channel information from build.rs.
I think some of the questions around possible adversarial manipulation of SBOM data are basically asking for a threat model. Do we expect packages to be malicious, do we do anything to protect against malicious action by them? It's probably worth writing down explicitly in the RFC (I'm interested in helping if help is desired), at least so there's clarity.
I think some of the questions around possible adversarial manipulation of SBOM data are basically asking for a threat model. Do we expect packages to be malicious, do we do anything to protect against malicious action by them? It's probably worth writing down explicitly in the RFC (I'm interested in helping if help is desired), at least so there's clarity.
We could start with something like:
Cargo's SBOM provides an accurate report of the components and dependencies used by
cargoto build a software artefact. These components and dependencies are trusted to:
- report transitive dependencies accurately,
- report components they use accurately, and
- accurately modify the SBOM precursor written by
cargo, or preserve it without modification.
cargodoes not defend against malicious components or dependencies changing the SBOM, or accidentally or maliciously concealing themselves from the SBOM. In particular,cargomay not include components or dependencies added by build scripts or external tools. Ideally, tools should provide their own SBOMs, and build scripts should modify the SBOM via supportedcargointerfaces.
In the Pre-RFC I saw this referred to as an "SBOM fragment". Would using that language help with some of the worries and confusion between regulatory SBOM, existing SBOM formats, and SBOM data that is easiest to access from Cargo?