syft icon indicating copy to clipboard operation
syft copied to clipboard

--exclude-pkgs option

Open rchincha opened this issue 3 years ago • 12 comments

What would you like to be added:

Once packages are discovered using the cataloger, can I specify a list of packages to be excluded?

Why is this needed:

Generate a SBOM only for a subset of packages.

Additional context:

rchincha avatar Sep 27 '22 21:09 rchincha

Thanks for the issue @rchincha!

Can you walk us through the reasoning for excluding packages? We want syft to be as close to the truth as possible when generating an SBOM.

Allowing users to exclude or omit packages that are present and cataloged seems a little outside of that goal.

Definitely happy to talk through how you would use it!

spiffcs avatar Sep 28 '22 15:09 spiffcs

We've also added this to the agenda for tomorrow's community meeting for syft and grype. Feel free to join there as well and will get other feedback from the community!

https://twitter.com/GrypeProject/status/1574431163799801856?cxt=HHwWgMC8maizwNkrAAAA

spiffcs avatar Sep 28 '22 15:09 spiffcs

Thanks for the issue @rchincha!

Can you walk us through the reasoning for excluding packages? We want syft to be as close to the truth as possible when generating an SBOM.

Allowing users to exclude or omit packages that are present and cataloged seems a little outside of that goal.

Definitely happy to talk through how you would use it!

Thanks @spiffcs.

We have a situation where we build a chroot without the package db of any sort. So in order to get the syft's sbom capability, we setup a separate environment with some base distro install, install required packages on top of it and would now like to exclude the packages in the base install if appropriate. Alternatively, instead of a blacklist (--exclude), perhaps a whitelist (--include) will work better. Hope the problem statement is clear.

rchincha avatar Sep 28 '22 17:09 rchincha

Any additional thoughts/updates on this?

rchincha avatar Oct 05 '22 22:10 rchincha

Hi @rchincha, we discussed this at the community meeting last week (see the notes here). If I understand the use case you're talking about it's less about excluding packages and more about only including user-defined packages (but excluding the base image packages), is this correct? If so, this is something that has been asked for before and something we'd like to do. We have the concepts of scopes but currently only squashed and all-layers. We would add another scope something like user-layers and I suspect would be easier to for you to use than an explicit exclude list of packages, what do you think?

kzantow avatar Oct 06 '22 14:10 kzantow

@kzantow, yes spot-on our requirement.

Your suggestion about user-layers could work also - I assume you will work out what that would mean in terms of how one would figure out which the user-layers are. For us, given a base set of layers, we can install all our packages in a new layer, then scanning and reporting from that new layer alone could work.

https://github.com/anchore/syft#sbom ^ also could you expand a bit more about squashed and all-layers. What is the difference? Perhaps an example or two, for our understanding.

rchincha avatar Oct 06 '22 16:10 rchincha

@rchincha the idea is we would just exclude the layers from the base image, so any layers you add from your own Dockerfile would be included. I'm not sure we've worked out every detail here, but that's the gist.

As for squashed vs all-layers:

  • squashed: only scans the final layer filesystem
  • all-layers: scans each layer in the image individually

The difference here is all-layers would find things that were present at one point, but removed before the final filesystem.

kzantow avatar Oct 06 '22 17:10 kzantow

@kzantow thanks for the clarification, it is the deletions that make the two options different.

About user-layers, is there an ETA to expect. We don't mind pitching in if it helps expedite.

rchincha avatar Oct 06 '22 17:10 rchincha

@rchincha we do not currently have an ETA for this, but of course PRs are welcome! FYI - I believe this change would probably need to be done predominantly in the stereoscope library, which Syft relies on for processing images.

kzantow avatar Oct 06 '22 17:10 kzantow

@kzantow after thinking about this some more, also wondering if an --offline option is feasible.

Most package managers, given a package name/version, can also list files included in the package and files to be installed.

$ dpkg -l curl
ii  curl           7.81.0-1ubuntu1.4 amd64        command line tool for transferring data with URL syntax

$ dpkg-query -L curl
/.
/usr
/usr/bin
/usr/bin/curl
...
etc

So the question is can one simply pass the package name/version and its constituent list of files and generate a SPDX document? This of course will be orthogonal to grokking container images.

rchincha avatar Oct 10 '22 20:10 rchincha

@rchincha I don't quite follow --offline in this context, but if you're thinking about providing Syft with a list of packages and/or files, this might be feasible way to do things. We are working on having a way to catalog SBOMs we find on the file system, and we could potentially add a "simple" SBOM format that's like a CSV or text file.

kzantow avatar Oct 11 '22 16:10 kzantow

"but if you're thinking about providing Syft with a list of packages and/or files, this might be feasible way to do things." exactly this ^

rchincha avatar Oct 11 '22 20:10 rchincha

Notes from a quick discussion:

  • Generally, Syft surfaces the best information it can; excluding certain packages seems contrary to this
  • Syft does provide a way of excluding files. However, this can be used to prevent Syft from sifting through directories that are known to contain nothing useful, or whatever, so it's different than excluding packages that were already found
  • Currently, you can turn off catalogers and files. However, in situations where there's a package manager where all the packages are at the same path (some package manager database), there's currently no way to exclude. For example, I can tell syft to exclude the path to the RPM DB, which will cause syft to miss all RPMs, or to exclude the RPM cataloger, which will cause Syft to miss all RPMs, but I can't tell Syft, "here's a list of 5 RPMs I don't care to hear about." So there is a bit of a gap in the configuration possibilities.
  • Unwanted packages could be removed in post-processing, but this tampers with the SBOM after the fact, and is error prone (e.g. the removed package might own files, leaving dangling relationships)
  • If "exclude-packages" was a config option, it would be captured in the syft config block in the SBOM

@rchincha can you help me understand why you only want a subset of packages?

willmurphyscode avatar Dec 18 '24 15:12 willmurphyscode