rules_docker icon indicating copy to clipboard operation
rules_docker copied to clipboard

Keep package-installed files listing for Debian packages installed in a distroless image

Open pombredanne opened this issue 4 years ago • 6 comments

🚀 feature request

Relevant Rules

When a package is installed, only metadata are kept and the list of installed files is lost/not saved with the package metadata.

I have a concern with what happens here: https://github.com/bazelbuild/rules_docker/blob/d18033b7eb3429a55dc4a579b5c19af57ab25e5f/container/build_tar.py#L224

Description

In a distroless container image, the as-installed .deb packages are not saved with their files/md5sums file lists in what would be in /var/lib/dpkg/info on a regular Debian install. As a result, it is not possible to relate an installed package in a distroless image/layer to the set of files that were installed with this package.

This data can be important for software composition analysis and its security and license compliance tracking applications.

Describe the solution you'd like

Each installed package should include some installed file listing possibly added in some per package file in the status.d/ directory. This is a Debian standard in /var/lib/dpkg/info/<package name>

This would make distroless images more readily introspectable, otherwise there is no intrisic way to relate a package (in status.d) to the set of its installed files.

@tejal29 you committed this originally with @dlorenc ... any insight to share there?

Describe alternatives you've considered

I cannot fathom an in-container alternative to keep a tab of each packaged-installed file. Tracking outside would mean maintaining some external database which does not seem practical.

pombredanne avatar May 26 '21 16:05 pombredanne

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!

github-actions[bot] avatar Nov 25 '21 02:11 github-actions[bot]

@pombredanne I would love to help, do you already have something I can start with?

fedemengo avatar Dec 23 '21 21:12 fedemengo

@fedemengo sorry for a late reply. I do not have anything done yet, but I would likely either:

  • continue using a "non-like-debian" /var/lib/dpkg/status.d/<package name> and add /var/lib/dpkg/status.d/<package name>.md5sums (originally under /var/lib/dpkg/info/<package name>.md5sums) and/or create a /var/lib/dpkg/status.d/<package name>.list file for installed files also originally under /var/lib/dpkg/info/<package name>.list
  • OR just keep and copy over the /var/lib/dpkg/info/ directory as is which will contain all the original parts of the packages.

pombredanne avatar Mar 27 '22 13:03 pombredanne

After a brief investigation the metadata file passed to add_pkg_metadata seems to still have the package files. I tested with random deb packages jq_1.6-2.1_amd64.deb

Version: 1.6-2.1
Architecture: amd64
Maintainer: ChangZhuo Chen (陳昌倬) <[email protected]>
Installed-Size: 110
Depends: libjq1 (= 1.6-2.1), libc6 (>= 2.4)
Section: utils
Priority: optional
Multi-Arch: foreign
Homepage: https://github.com/stedolan/jq
Description: lightweight and flexible command-line JSON processor
 jq is like sed for JSON data – you can use it to slice
 and filter and map and transform structured data with
 the same ease that sed, awk, grep and friends let you
 play with text.
 .
 It is written in portable C, and it has minimal runtime
 dependencies.
 .
 jq can mangle the data format that you have into the
 one that you want with very little effort, and the
 program to do so is often shorter and simpler than
 you’d expect.
./md5sums0000644000000000000000000000064613764355465011261 0ustar  rootroot4805bfbf88146bbb434b248c8548ba9a  usr/bin/jq
5563b04c49c62365021c85daa51b2ea9  usr/share/doc/jq/AUTHORS.gz
c364f0eca2f62a00bdba467b6dcec0c6  usr/share/doc/jq/README
7bbac574353d0a7b979154962e609e9e  usr/share/doc/jq/changelog.Debian.gz
71ebdef08d6145814339da04bbb38ee7  usr/share/doc/jq/changelog.gz
1745dfca81b4c36132c52fa5e972d6cd  usr/share/doc/jq/copyright
f7f27caeb55e22fb67c74a10a03053a1  usr/share/man/man1/jq.1.gz

and pppoe_3.12-1.2_amd64.deb

Source: rp-pppoe
Version: 3.12-1.2
Architecture: amd64
Maintainer: Andreas Barth <[email protected]>
Installed-Size: 239
Depends: libc6 (>= 2.14), ppp (>= 2.3.10-1)
Section: net
Priority: optional
Description: PPP over Ethernet driver
 PPP over Ethernet (PPPoE) is a protocol used by
 many ADSL Internet service providers. This package allows
 you to connect to those PPPoE service providers.
./md5sums0000644000000000000000000000270513400775643011246 0ustar  rootroot49f16f269e495ac63284930ddb35819d  usr/sbin/pppoe
fecf103f0643fc47f6e2b6ab189ba836  usr/sbin/pppoe-connect
f0c57b276c5f71c1bea0e68d1ed05cc4  usr/sbin/pppoe-relay
fc48502b12c572f651db2c830e1e5023  usr/sbin/pppoe-server
05ca70beff7548aa62f7338526c4de7f  usr/sbin/pppoe-sniff
6fad7c0d267557577956b34d9a8e5ab2  usr/sbin/pppoe-start
b60a26c2098b31466c597b69086481b8  usr/sbin/pppoe-status
08ad72bf1a79f3d5c726216bdd3c7be0  usr/sbin/pppoe-stop
7135c95b9cd1de83278a6ed59967f15e  usr/share/doc/pppoe/README.Debian.gz
273f383c93571467392442a65efb59d3  usr/share/doc/pppoe/changelog.Debian.gz
588f951008f3c8832342e32afbbce587  usr/share/doc/pppoe/changelog.gz
e6d8e774d4c0b4a71cd7c0b407ee51a2  usr/share/doc/pppoe/copyright
3989cc121f314dd71ba995bce0d7cc7a  usr/share/lintian/overrides/pppoe
0f56b077433fa2ae3061b81734c0c3ab  usr/share/man/man5/pppoe.conf.5.gz
00f42cc119e815b6f67a06b66f6bc98e  usr/share/man/man8/pppoe-connect.8.gz
3043945ca5df6fd72ca21175e7363f4c  usr/share/man/man8/pppoe-relay.8.gz
0bc3f0deffb7e56e88629ac4306a7c50  usr/share/man/man8/pppoe-server.8.gz
6ca2fde4e0c47ba3f31b29ebd557e3ad  usr/share/man/man8/pppoe-setup.8.gz
af7d2547aead5aedd657e6dffc9238a1  usr/share/man/man8/pppoe-sniff.8.gz
6737faadbf8d5182b570a2238bf0662f  usr/share/man/man8/pppoe-start.8.gz
6c6185ef482c042a3bb5e72a4eca4a5b  usr/share/man/man8/pppoe-status.8.gz
948141509e725bcdb152e73a16effc38  usr/share/man/man8/pppoe-stop.8.gz
2b99016e346bc73fa7c4be686c0b527b  usr/share/man/man8/pppoe.8.gz

So we must be losing the file information somewhere else. I'll keep digging.

fedemengo avatar Apr 22 '22 22:04 fedemengo

@fedemengo I do not think anything is "lost" .... but rather that in https://github.com/bazelbuild/rules_docker/blob/6ea707babdcd54514e0884278ac624fb8bda19c1/container/build_tar.py#L224 we have only one file that's extracted and that's the control file: https://github.com/bazelbuild/rules_docker/blob/6ea707babdcd54514e0884278ac624fb8bda19c1/container/build_tar.py#L41

we want the md5sums file to be extracted.

I pushed a PR here with a test: https://github.com/bazelbuild/rules_docker/pull/2065

pombredanne avatar Apr 23 '22 08:04 pombredanne

I would really appreciate some review of https://github.com/bazelbuild/rules_docker/pull/2065 before it goes stale.

pombredanne avatar May 20 '22 05:05 pombredanne

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_docker!

github-actions[bot] avatar Nov 17 '22 03:11 github-actions[bot]

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

github-actions[bot] avatar Dec 18 '22 02:12 github-actions[bot]