distroless icon indicating copy to clipboard operation
distroless copied to clipboard

Keep lists of packaged-installed files inside a built image

Open pombredanne opened this issue 4 years ago • 22 comments

Since distroless are primarily built with Bazel I filed this issue https://github.com/bazelbuild/rules_docker/issues/1876 that am repasting here... but I reckon this may need to be tracked here instead:

🚀 feature request

Relevant Rules

When a package is installed, only metadata are kept and the list of installed files is lost/not saved with the package metadata.

I have a concern with what happens here: https://github.com/bazelbuild/rules_docker/blob/d18033b7eb3429a55dc4a579b5c19af57ab25e5f/container/build_tar.py#L224

Description

In a distroless container image, the as-installed .deb packages are not saved with their files/md5sums file lists in what would be in /var/lib/dpkg/info on a regular Debian install. As a result, it is not possible to relate an installed package in a distroless image/layer to the set of files that were installed with this package.

This data can be important for software composition analysis and its security and license compliance tracking applications.

Describe the solution you'd like

Each installed package should include some installed file listing possibly added in some per package file in the status.d/ directory. This is a Debian standard in /var/lib/dpkg/info/<package name>

This would make distroless images more readily introspectable and observable, otherwise there is no intrinsic way to relate a package (in status.d) to the set of its installed files.

@tejal29 you committed this originally with @dlorenc ... any insight to share there?

Describe alternatives you've considered

I cannot fathom an in-container alternative to keep a tab of each packaged-installed file. Tracking outside would mean maintaining some external database which does not seem practical.

pombredanne avatar May 28 '21 09:05 pombredanne

Gentle ping :)

pombredanne avatar Aug 26 '21 12:08 pombredanne

We have the list of installed packages in /bar/lib/dpkg/status, you're requesting a list of installed files, mapped back to the packages?

cc @loosebazooka

dlorenc avatar Aug 26 '21 12:08 dlorenc

@dlorenc

you're requesting a list of installed files, mapped back to the packages?

yes

pombredanne avatar Aug 26 '21 13:08 pombredanne

Yeah I think this needs to be solved in rules_docker. Can you point to the debian docs for this, that would be helpful.

loosebazooka avatar Aug 26 '21 14:08 loosebazooka

@loosebazooka See

  • https://www.debian.org/doc/manuals/debian-reference/ch02.en.html#_verification_of_installed_package_files
  • https://www.debian.org/doc/manuals/debian-reference/ch02.en.html#thenotablefilescreatedbydpkg

Now since you already departed from the standard dpkg Debian layout with the status.d/ layout, feel free to use what you like. IMHO the simplest would be something such as /var/lib/dpkg/info/package_name.list list of files and directories installed by the package stored side-by-side with the status file. e.g. given /var/lib/dpkg/status.d/tzdata that contains package status for tzdata, /var/lib/dpkg/status.d/tzdata.list would be the list of installed paths for tzdata one line per path. It would be nice to also document this of course (including the actual use of status.d/ and the corresponding copyright files that are already there)

pombredanne avatar Aug 26 '21 15:08 pombredanne

For reference I also entered https://github.com/bazelbuild/rules_docker/issues/1876 way back when (some would say this is a double post... but I was not sure where to post what ;) )

pombredanne avatar Aug 26 '21 15:08 pombredanne

Yeah I'm not exactly sure about this history of this change. So I'll have to do some reading, but thanks for the link.

loosebazooka avatar Aug 27 '21 13:08 loosebazooka

@loosebazooka

I'm not exactly sure about this history of this change.

I am not sure what you mean by this... but if you mean about when the status.d files were introduced and what was there before, this looks simple from what I can see.

There was a single commit that introduced keeping some metadata in https://github.com/bazelbuild/rules_docker/commit/f5432b813e0a11491cf2bf83ff1a923706b36420 which essentially takes the control file and dumps it under status.d/

Before no metadata was kept https://github.com/bazelbuild/rules_docker/blob/3caf72f166f8b6b0e529442477a74871ad4d35e9/container/build_tar.py#L181

I can provide a patch in rules docker that would have either one of these effects in https://github.com/bazelbuild/rules_docker/blob/e5368f9c425854ddb5af31624f0a6b99a0d3f1fb/container/build_tar.py#L224

  • also extract any list and md5sums present in the metadata tarball of the deb package under statsus.d/ and named with this convention : and
  • alternatively, create a .list file listing all the files extracted from the deb data tarball

Do you want such a patch?

pombredanne avatar Aug 27 '21 14:08 pombredanne

@loosebazooka gentle ping... do you want a patch here or at https://github.com/bazelbuild/rules_docker/issues/1876?

pombredanne avatar Sep 13 '21 14:09 pombredanne

Oh sorry, yeah I mean I don't know why this form of metadata was chosen. Anyway, it seems like the correct place to inject the metadata is in rules_docker. Please provide a patch there.

loosebazooka avatar Sep 13 '21 14:09 loosebazooka

@pombredanne gentle ping, any news on the patch?

fedemengo avatar Dec 18 '21 15:12 fedemengo

@pombredanne gentle ping, any news on the patch?

I have not attacked this yet. Do you want to chip in and help?

pombredanne avatar Dec 20 '21 06:12 pombredanne

Let's continue the discussion over at bazelbuild/rules_docker#1876

fedemengo avatar Dec 23 '21 21:12 fedemengo

@loosebazooka FYI I pushed a fix in https://github.com/bazelbuild/rules_docker/pull/2065 and your review is mucho welcomed there

pombredanne avatar Apr 23 '22 09:04 pombredanne

Since the fix has been provided by @pombredanne and released in bazel docker rules v0.25.0 what's left to see the change reflected in new images?

Is it enough to bump rules_docker here?

fedemengo avatar Dec 13 '22 16:12 fedemengo

@thesayyn are these covered in the new rules_oci?

loosebazooka avatar Dec 13 '22 18:12 loosebazooka

@thesayyn are these covered in the new rules_oci?

Yes. it is.

NOTE: some packages don't have an md5sums file, in that case, it is absent.

thesayyn avatar Dec 13 '22 18:12 thesayyn

@thesayyn are these covered in the new rules_oci?

Yes. it is.

Does this mean we should already see it reflected in new images?

NOTE: some packages don't have an md5sums file, in that case, it is absent.

at least for the packages that have md5sums files

fedemengo avatar Dec 14 '22 13:12 fedemengo

@fedemengo not yet. We're in the middle of a larger transition to rules_oci and when that is complete, you will being to see this metadata.

loosebazooka avatar Dec 14 '22 14:12 loosebazooka

awesome, thanks for the update

fedemengo avatar Dec 14 '22 17:12 fedemengo

@loosebazooka you wrote:

We're in the middle of a larger transition to rules_oci and when that is complete, you will being to see this metadata.

Hey! is the transition done?

pombredanne avatar Jun 01 '23 17:06 pombredanne

looks like after https://github.com/GoogleContainerTools/distroless/pull/1367 the new images contain the expected metadata

fedemengo avatar Jan 30 '24 10:01 fedemengo