malcontent icon indicating copy to clipboard operation
malcontent copied to clipboard

transparent archives: present original archive path and inner path in output

Open tstromberg opened this issue 1 year ago • 2 comments

With #174 we have transparent archive handling, but the output shows the temp file name:

/var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/bincapz-apko_0.13.2_linux_arm64.tar.gz1015874883/apko_0.13.2_linux_arm64/apko

What I think would be cool is if we can display the archive file as well as the file within it:

/Users/egibs/Downloads/apko_0.13.2_darwin_amd64.tar.gz ∴ apko

To accomplish this, we'll need a change to the FileReport struct:

https://github.com/chainguard-dev/bincapz/blob/f029652a7756dda5e26c9906cb38a01e6f8f3ac9/pkg/bincapz/bincapz.go#L26

... and add something like a SubPath or InnerPath.

It's worth noting that the Report struct has a map that also contains a filename: Files map[string]FileReport - it has to be unique, so I think it's fine if we keep the temporary file path there.

The ultimate test of this feature is whether or not --diff mode works with two directories of archives.

tstromberg avatar May 02 '24 13:05 tstromberg

The ultimate test of this feature is whether or not --diff mode works with two directories of archives.

This works (but I'm not the greatest judge as to its veracity):

$ go run . --diff ~/Downloads/apko_tar_gzs/apko_0.13.2_darwin_amd64.tar.gz ~/Downloads/apko_tar_gzs_2/apko_0.13.2_darwin_arm64.tar.gz
Moved: ../../../../../var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/apko_0.13.2_darwin_amd64.tar.gz824651334/apko_0.13.2_darwin_amd64/apko -> ../../../../../var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/apko_0.13.2_darwin_arm64.tar.gz2308444237/apko_0.13.2_darwin_arm64/apko (score: 0.941791)+++ ADDED: 3 behavior(s) +++
------------------------------------------------------------------------------
RISK  KEY                          DESCRIPTION                      EVIDENCE
------------------------------------------------------------------------------
+LOW  process/chdir                changes working directory        cd H2l
+MED  net/bpf                      BPF (Berkeley Packet Filter)     bpf
+MED  security_controls/linux/ufw  interacts with the ufw firewall  ufw
------------------------------------------------------------------------------

egibs avatar May 03 '24 01:05 egibs

I've been thinking about the implementation somewhat, and have some further thoughts.. this is kind of stream of consciousness:

  • There should be some concept of layers: for example, if findings are found in a .gem file within a .tar.gz, we should be able to present that relationship in our findings.

  • I don't think we need to support diff'ing within two archive files for now.

  • In general, we should treat files by the original path they sit on in the filesystem. For example, if we scan a directory of 30 .tar.gz files, the statistics should be based on how many of those .tar.gz files are matched, not each individual file within them.

  • This has me wondering if perhaps what we need is a hierarchy like:

File -> Layer -> Behaviors

So, for the case of .zip file full of .gem files, we'd see something like:

./Downloads/gems.zip -> rodney.gem:data.tar.gz:rodney.rb -> Behaviors

In the terminal output, you'd see something like:

Path: ./Download.gems.zip ∴ rodney.gem ∴ data.tar.gz ∴ rodney.rb

or maybe:

Path: ./Download.gems.zip ≡ rodney.gem ≡ data.tar.gz ≡ rodney.rb

Each layer would show their own table of behaviors.

The less complicated we can make the JSON output, the better.

tstromberg avatar May 10 '24 20:05 tstromberg

This was addressed in https://github.com/chainguard-dev/bincapz/pull/217 (sans the concept of reporting per-layer behaviors which can be a future enhancement).

egibs avatar Jun 02 '24 14:06 egibs