transparent archives: present original archive path and inner path in output
With #174 we have transparent archive handling, but the output shows the temp file name:
/var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/bincapz-apko_0.13.2_linux_arm64.tar.gz1015874883/apko_0.13.2_linux_arm64/apko
What I think would be cool is if we can display the archive file as well as the file within it:
/Users/egibs/Downloads/apko_0.13.2_darwin_amd64.tar.gz ∴ apko
To accomplish this, we'll need a change to the FileReport struct:
https://github.com/chainguard-dev/bincapz/blob/f029652a7756dda5e26c9906cb38a01e6f8f3ac9/pkg/bincapz/bincapz.go#L26
... and add something like a SubPath or InnerPath.
It's worth noting that the Report struct has a map that also contains a filename: Files map[string]FileReport - it has to be unique, so I think it's fine if we keep the temporary file path there.
The ultimate test of this feature is whether or not --diff mode works with two directories of archives.
The ultimate test of this feature is whether or not --diff mode works with two directories of archives.
This works (but I'm not the greatest judge as to its veracity):
$ go run . --diff ~/Downloads/apko_tar_gzs/apko_0.13.2_darwin_amd64.tar.gz ~/Downloads/apko_tar_gzs_2/apko_0.13.2_darwin_arm64.tar.gz
Moved: ../../../../../var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/apko_0.13.2_darwin_amd64.tar.gz824651334/apko_0.13.2_darwin_amd64/apko -> ../../../../../var/folders/3g/88131l9j11x995ppjbxsvhbh0000gn/T/apko_0.13.2_darwin_arm64.tar.gz2308444237/apko_0.13.2_darwin_arm64/apko (score: 0.941791)+++ ADDED: 3 behavior(s) +++
------------------------------------------------------------------------------
RISK KEY DESCRIPTION EVIDENCE
------------------------------------------------------------------------------
+LOW process/chdir changes working directory cd H2l
+MED net/bpf BPF (Berkeley Packet Filter) bpf
+MED security_controls/linux/ufw interacts with the ufw firewall ufw
------------------------------------------------------------------------------
I've been thinking about the implementation somewhat, and have some further thoughts.. this is kind of stream of consciousness:
-
There should be some concept of layers: for example, if findings are found in a .gem file within a .tar.gz, we should be able to present that relationship in our findings.
-
I don't think we need to support diff'ing within two archive files for now.
-
In general, we should treat files by the original path they sit on in the filesystem. For example, if we scan a directory of 30 .tar.gz files, the statistics should be based on how many of those .tar.gz files are matched, not each individual file within them.
-
This has me wondering if perhaps what we need is a hierarchy like:
File -> Layer -> Behaviors
So, for the case of .zip file full of .gem files, we'd see something like:
./Downloads/gems.zip -> rodney.gem:data.tar.gz:rodney.rb -> Behaviors
In the terminal output, you'd see something like:
Path: ./Download.gems.zip ∴ rodney.gem ∴ data.tar.gz ∴ rodney.rb
or maybe:
Path: ./Download.gems.zip ≡ rodney.gem ≡ data.tar.gz ≡ rodney.rb
Each layer would show their own table of behaviors.
The less complicated we can make the JSON output, the better.
This was addressed in https://github.com/chainguard-dev/bincapz/pull/217 (sans the concept of reporting per-layer behaviors which can be a future enhancement).