Keep package-installed files listing for Debian packages installed in a distroless image
🚀 feature request
Relevant Rules
When a package is installed, only metadata are kept and the list of installed files is lost/not saved with the package metadata.
I have a concern with what happens here: https://github.com/bazelbuild/rules_docker/blob/d18033b7eb3429a55dc4a579b5c19af57ab25e5f/container/build_tar.py#L224
Description
In a distroless container image, the as-installed .deb packages are not saved with their files/md5sums file lists in what would be in /var/lib/dpkg/info on a regular Debian install. As a result, it is not possible to relate an installed package in a distroless image/layer to the set of files that were installed with this package.
This data can be important for software composition analysis and its security and license compliance tracking applications.
Describe the solution you'd like
Each installed package should include some installed file listing possibly added in some per package file in the status.d/ directory. This is a Debian standard in /var/lib/dpkg/info/<package name>
This would make distroless images more readily introspectable, otherwise there is no intrisic way to relate a package (in status.d) to the set of its installed files.
@tejal29 you committed this originally with @dlorenc ... any insight to share there?
Describe alternatives you've considered
I cannot fathom an in-container alternative to keep a tab of each packaged-installed file. Tracking outside would mean maintaining some external database which does not seem practical.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!
@pombredanne I would love to help, do you already have something I can start with?
@fedemengo sorry for a late reply. I do not have anything done yet, but I would likely either:
- continue using a "non-like-debian"
/var/lib/dpkg/status.d/<package name>and add/var/lib/dpkg/status.d/<package name>.md5sums(originally under/var/lib/dpkg/info/<package name>.md5sums) and/or create a/var/lib/dpkg/status.d/<package name>.listfile for installed files also originally under/var/lib/dpkg/info/<package name>.list - OR just keep and copy over the
/var/lib/dpkg/info/directory as is which will contain all the original parts of the packages.
After a brief investigation the metadata file passed to add_pkg_metadata seems to still have the package files. I tested with random deb packages
jq_1.6-2.1_amd64.deb
Version: 1.6-2.1
Architecture: amd64
Maintainer: ChangZhuo Chen (陳昌倬) <[email protected]>
Installed-Size: 110
Depends: libjq1 (= 1.6-2.1), libc6 (>= 2.4)
Section: utils
Priority: optional
Multi-Arch: foreign
Homepage: https://github.com/stedolan/jq
Description: lightweight and flexible command-line JSON processor
jq is like sed for JSON data – you can use it to slice
and filter and map and transform structured data with
the same ease that sed, awk, grep and friends let you
play with text.
.
It is written in portable C, and it has minimal runtime
dependencies.
.
jq can mangle the data format that you have into the
one that you want with very little effort, and the
program to do so is often shorter and simpler than
you’d expect.
./md5sums0000644000000000000000000000064613764355465011261 0ustar rootroot4805bfbf88146bbb434b248c8548ba9a usr/bin/jq
5563b04c49c62365021c85daa51b2ea9 usr/share/doc/jq/AUTHORS.gz
c364f0eca2f62a00bdba467b6dcec0c6 usr/share/doc/jq/README
7bbac574353d0a7b979154962e609e9e usr/share/doc/jq/changelog.Debian.gz
71ebdef08d6145814339da04bbb38ee7 usr/share/doc/jq/changelog.gz
1745dfca81b4c36132c52fa5e972d6cd usr/share/doc/jq/copyright
f7f27caeb55e22fb67c74a10a03053a1 usr/share/man/man1/jq.1.gz
and pppoe_3.12-1.2_amd64.deb
Source: rp-pppoe
Version: 3.12-1.2
Architecture: amd64
Maintainer: Andreas Barth <[email protected]>
Installed-Size: 239
Depends: libc6 (>= 2.14), ppp (>= 2.3.10-1)
Section: net
Priority: optional
Description: PPP over Ethernet driver
PPP over Ethernet (PPPoE) is a protocol used by
many ADSL Internet service providers. This package allows
you to connect to those PPPoE service providers.
./md5sums0000644000000000000000000000270513400775643011246 0ustar rootroot49f16f269e495ac63284930ddb35819d usr/sbin/pppoe
fecf103f0643fc47f6e2b6ab189ba836 usr/sbin/pppoe-connect
f0c57b276c5f71c1bea0e68d1ed05cc4 usr/sbin/pppoe-relay
fc48502b12c572f651db2c830e1e5023 usr/sbin/pppoe-server
05ca70beff7548aa62f7338526c4de7f usr/sbin/pppoe-sniff
6fad7c0d267557577956b34d9a8e5ab2 usr/sbin/pppoe-start
b60a26c2098b31466c597b69086481b8 usr/sbin/pppoe-status
08ad72bf1a79f3d5c726216bdd3c7be0 usr/sbin/pppoe-stop
7135c95b9cd1de83278a6ed59967f15e usr/share/doc/pppoe/README.Debian.gz
273f383c93571467392442a65efb59d3 usr/share/doc/pppoe/changelog.Debian.gz
588f951008f3c8832342e32afbbce587 usr/share/doc/pppoe/changelog.gz
e6d8e774d4c0b4a71cd7c0b407ee51a2 usr/share/doc/pppoe/copyright
3989cc121f314dd71ba995bce0d7cc7a usr/share/lintian/overrides/pppoe
0f56b077433fa2ae3061b81734c0c3ab usr/share/man/man5/pppoe.conf.5.gz
00f42cc119e815b6f67a06b66f6bc98e usr/share/man/man8/pppoe-connect.8.gz
3043945ca5df6fd72ca21175e7363f4c usr/share/man/man8/pppoe-relay.8.gz
0bc3f0deffb7e56e88629ac4306a7c50 usr/share/man/man8/pppoe-server.8.gz
6ca2fde4e0c47ba3f31b29ebd557e3ad usr/share/man/man8/pppoe-setup.8.gz
af7d2547aead5aedd657e6dffc9238a1 usr/share/man/man8/pppoe-sniff.8.gz
6737faadbf8d5182b570a2238bf0662f usr/share/man/man8/pppoe-start.8.gz
6c6185ef482c042a3bb5e72a4eca4a5b usr/share/man/man8/pppoe-status.8.gz
948141509e725bcdb152e73a16effc38 usr/share/man/man8/pppoe-stop.8.gz
2b99016e346bc73fa7c4be686c0b527b usr/share/man/man8/pppoe.8.gz
So we must be losing the file information somewhere else. I'll keep digging.
@fedemengo I do not think anything is "lost" .... but rather that in https://github.com/bazelbuild/rules_docker/blob/6ea707babdcd54514e0884278ac624fb8bda19c1/container/build_tar.py#L224 we have only one file that's extracted and that's the control file: https://github.com/bazelbuild/rules_docker/blob/6ea707babdcd54514e0884278ac624fb8bda19c1/container/build_tar.py#L41
we want the md5sums file to be extracted.
I pushed a PR here with a test: https://github.com/bazelbuild/rules_docker/pull/2065
I would really appreciate some review of https://github.com/bazelbuild/rules_docker/pull/2065 before it goes stale.
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!
This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"