syft icon indicating copy to clipboard operation
syft copied to clipboard

Describe multiple SBOM scan targets

Open wagoodman opened this issue 4 years ago • 5 comments

What would you like to be added: Be able to specify multiple targets that where one or more SBOMs are created. Take the following examples for illustrative purposes:

# syft.yaml

inputs:
- type: image
  id: my-image-sbom
  value: docker.io/me/my-image:latest
  format: spdxjson
  
- type: directory
  id: my-source-sbom
  value: ./src
  format: spdx

This would allow for scanning an artifact and source and produce two different sboms, such that in CI invocation would simply be:

# syft.yaml is automatically assumed...
syft
# ...output "my-image-sbom.json" and "my-source-sbom.spdx" files

You could combine the output from multiple cataloging efforts into the same SBOM by using the same id for each input:

# syft.yaml

inputs:
- type: image
  id: my-sbom
  root-package: container
  value: docker.io/me/my-image:latest
  format: spdxjson
  
- type: directory
  root-package: source
  id: my-sbom
  value: ./src

Where the result would be a single my-sbom.json in the spdxjson output. Additionally, anything found in the container will have a relationship tied to a phantom "container" package and anything in the source scanning would have a relationship to a phantom "source" package.

I'm not 100% in love with the proposed format above as it would be easy to abuse when it comes to combining incompatible formats, but it suits for illustrative purposes.

We could surface a small set of this functionality via the CLI by allowing for multiple scan targets:

syft  dir:./  image:docker.io/me/my-image:latest -o spdxjson

Why is this needed: For more complicated workflows it would be ideal to encode what needs to be cataloged into a description instead of relying on the consumer to orchestrate multiple syft calls with bash.

Additionally there is no way to deal with "multiple" SBOMs with syft, or grouping related items with relationships, which could be a powerful pattern.

wagoodman avatar Oct 16 '21 19:10 wagoodman

Another approach to the output here would be to allow for syft to take multiple images as input or a multi-arch image as input and stream multiple SBOM documents to the file in question. How this could for for each format:

  • table: simply output multiple tables, with an additional header to list which image is being processed
  • json, spdx-json, cyclonedx-json: require single line output, treat the document as JSONLs
  • spdx-tag-value: not supported
  • cyclonedx-xml: xml already already supports multiple embedded tags in a single doc

This dodges the problem of needing to solve how multiple sources are handled in a single SBOM, and instead this can be handled in something that intentionally takes multiple SBOMs for merging (for example syft merge sbom1.json sbom2.json).

This impacts #617 #3562 #562

wagoodman avatar Mar 05 '25 16:03 wagoodman

Another possibility is to do the following:

  1. Make the "source" part of the SBOM be the source of the many images, e.g. a kubernetes manifest or a multi-image OCI manifest
  2. Make artifacts in the SBOM with purl type pkg:oci for each image we found
  3. Use relationships and nesting to show which packages come from which image.

The advantage of this is that it can be done in all the SBOM specs.

willmurphyscode avatar Mar 12 '25 14:03 willmurphyscode

To move this forward I think the easiest path is to first add the CLI approach to this (as args, e.g. syft dir:./ image:docker.io/me/my-image:latest -o spdxjson) then in the future maybe optionally add a more complicated input configuration that allows for selection, additional relationships, and other operations.

wagoodman avatar Mar 12 '25 14:03 wagoodman

Talking to @popey at KubeCon about a multi-language project I need SBOMs produced for. The project has a Go backend and Vue.js frontend (with npm). Currently the backend's SBOMs are produced with ko.build's generator. Having issues with producing an SBOM with Syft against a container image.

I tried with a package.json shipped inside the image via this PR and built with ./hack/publish.sh --local. The package shows up but not the dependencies.

BobyMCbobs avatar Apr 02 '25 10:04 BobyMCbobs

This came up on a community discussion with @joonas due to a desire to use Syft to catalog Zarf packages. It was my understanding that a Zarf package is basically an archive that has subdirectories containing multiple OCI directories, maybe some other types of files. This is a problem today because Syft only has one location for a single Source.

However, in postulating about solutions, and in light of a number of different open issues related to this, one of the biggest challenges that keeps coming up is: where do we put all the data? In other words: how can we describe multiple sources in a meaningful way?

I think a solution could be to allow sources-as-packages (or possibly SBOMs-as-packages), where we just create new sources to populate the package metadata when we find them, construct new source objects and send them back through the same cataloging procedures, maybe by having a specific SourceCataloger that is able to identify nested sources, surface SourcePackages with the source information along with CONTAINS relationships to the found packages from those sub-scans.

Just to reiterate some of the known challenges:

... assuredly there are other things I'm missing, but that's the gist of our conversation.

kzantow avatar Dec 09 '25 15:12 kzantow