malcontent icon indicating copy to clipboard operation
malcontent copied to clipboard

Diff doesn't aggregate package updates into single events

Open Ais8Ooz8 opened this issue 2 months ago • 1 comments

The mal diff report doesn't look very coherent, especially if you don't extract the images yourself, but instead perform mal diff --image. It seems that Deleted -> Added events of the same package should be compressed into a single event like Changed: ... libzstd.so.1.5.6 -> libzstd.so.1.5.7 [...]. Do you have any ideas or plans for this kind of aggregation?

mal diff --image --file-risk-change ghcr.io/aquasecurity/trivy:0.65.0 ghcr.io/aquasecurity/trivy:0.66.0 | grep -e Added -e Deleted | sort

├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /+AfcLNAJNxFxG0hH40=.post-install [MEDIUM]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /+AfcLNAJNxFxG0hH40=.post-upgrade [MEDIUM]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /+AfcLNAJNxFxG0hH40=.pre-install [MEDIUM]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /+AfcLNAJNxFxG0hH40=.pre-upgrade [MEDIUM]
├─ 🛑 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /busybox-1.37.0-r18.Q1IVWNSWjzHcw3fA8n2um7DzK7JdI=.post-install [HIGH]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /busybox-1.37.0-r18.Q1IVWNSWjzHcw3fA8n2um7DzK7JdI=.post-upgrade [MEDIUM]
├─ 🛑 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /busybox-1.37.0-r18.Q1IVWNSWjzHcw3fA8n2um7DzK7JdI=.trigger [HIGH]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /ca-certificates-20250619-r0.Q1O3wy7NQ0LRAM8EyppKJ3AolkYeM=.post-deinstall [MEDIUM]
├─ 🛑 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /ca-certificates-20250619-r0.Q1O3wy7NQ0LRAM8EyppKJ3AolkYeM=.trigger [HIGH]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /libapk.so.2.14.9 [MEDIUM]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /libexpat.so.1.10.2 [MEDIUM]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /libnghttp2.so.14.28.4 [MEDIUM]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /libunistring.so.5.2.0 [MEDIUM]
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /libzstd.so.1.5.7 [MEDIUM]

├─ 🛑 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /0NTXAhIjY7Nqo=.post-install [HIGH]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /0NTXAhIjY7Nqo=.post-upgrade [MEDIUM]
├─ 🛑 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /0NTXAhIjY7Nqo=.trigger [HIGH]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /ca-certificates-20250619-r0.Q1xUNRT2WUrGiLIMFZ+1e2JbKz6MQ=.post-deinstall [MEDIUM]
├─ 🛑 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /ca-certificates-20250619-r0.Q1xUNRT2WUrGiLIMFZ+1e2JbKz6MQ=.trigger [HIGH]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /iSXcJI1Vf8x0TVc9Y=.post-install [MEDIUM]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /iSXcJI1Vf8x0TVc9Y=.post-upgrade [MEDIUM]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /iSXcJI1Vf8x0TVc9Y=.pre-install [MEDIUM]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /iSXcJI1Vf8x0TVc9Y=.pre-upgrade [MEDIUM]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /libapk.so.2.14.0 [MEDIUM]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /libexpat.so.1.10.1 [MEDIUM]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /libnghttp2.so.14.28.3 [MEDIUM]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /libunistring.so.5.1.0 [MEDIUM]
├─ 🟡 Deleted: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /libzstd.so.1.5.6 [MEDIUM]
mal diff --file-risk-change 0.65.0-rootfs/ 0.66.0-rootfs/ | grep -e Added -e Deleted | sort

├─ 🟡 Added: 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /alpine-baselayout-3.7.0-r0.Q1KfmXSO6h/+AfcLNAJNxFxG0hH40=.post-install [MEDIUM]
├─ 🟡 Added: 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /alpine-baselayout-3.7.0-r0.Q1KfmXSO6h/+AfcLNAJNxFxG0hH40=.post-upgrade [MEDIUM]
├─ 🟡 Added: 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /alpine-baselayout-3.7.0-r0.Q1KfmXSO6h/+AfcLNAJNxFxG0hH40=.pre-install [MEDIUM]
├─ 🟡 Added: 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /alpine-baselayout-3.7.0-r0.Q1KfmXSO6h/+AfcLNAJNxFxG0hH40=.pre-upgrade [MEDIUM]
├─ 🛑 Added: 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /busybox-1.37.0-r18.Q1IVWNSWjzHcw3fA8n2um7DzK7JdI=.post-install [HIGH]
├─ 🟡 Added: 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /busybox-1.37.0-r18.Q1IVWNSWjzHcw3fA8n2um7DzK7JdI=.post-upgrade [MEDIUM]
├─ 🛑 Added: 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /busybox-1.37.0-r18.Q1IVWNSWjzHcw3fA8n2um7DzK7JdI=.trigger [HIGH]
├─ 🟡 Added: 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /ca-certificates-20250619-r0.Q1O3wy7NQ0LRAM8EyppKJ3AolkYeM=.post-deinstall [MEDIUM]
├─ 🛑 Added: 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /ca-certificates-20250619-r0.Q1O3wy7NQ0LRAM8EyppKJ3AolkYeM=.trigger [HIGH]
├─ 🟡 Added: 0.66.0-rootfs/usr/lib/libapk.so.2.14.9 [MEDIUM]
├─ 🟡 Added: 0.66.0-rootfs/usr/lib/libexpat.so.1.10.2 [MEDIUM]
├─ 🟡 Added: 0.66.0-rootfs/usr/lib/libnghttp2.so.14.28.4 [MEDIUM]
├─ 🟡 Added: 0.66.0-rootfs/usr/lib/libunistring.so.5.2.0 [MEDIUM]
├─ 🟡 Added: 0.66.0-rootfs/usr/lib/libzstd.so.1.5.7 [MEDIUM]

├─ 🟡 Deleted: 0.65.0-rootfs/lib/apk/db/scripts.tar ∴ /alpine-baselayout-3.6.8-r1.Q17OteNVXn9/iSXcJI1Vf8x0TVc9Y=.post-install [MEDIUM]
├─ 🟡 Deleted: 0.65.0-rootfs/lib/apk/db/scripts.tar ∴ /alpine-baselayout-3.6.8-r1.Q17OteNVXn9/iSXcJI1Vf8x0TVc9Y=.post-upgrade [MEDIUM]
├─ 🟡 Deleted: 0.65.0-rootfs/lib/apk/db/scripts.tar ∴ /alpine-baselayout-3.6.8-r1.Q17OteNVXn9/iSXcJI1Vf8x0TVc9Y=.pre-install [MEDIUM]
├─ 🟡 Deleted: 0.65.0-rootfs/lib/apk/db/scripts.tar ∴ /alpine-baselayout-3.6.8-r1.Q17OteNVXn9/iSXcJI1Vf8x0TVc9Y=.pre-upgrade [MEDIUM]
├─ 🛑 Deleted: 0.65.0-rootfs/lib/apk/db/scripts.tar ∴ /busybox-1.37.0-r12.Q1sSNCl4MTQ0d1V/0NTXAhIjY7Nqo=.post-install [HIGH]
├─ 🟡 Deleted: 0.65.0-rootfs/lib/apk/db/scripts.tar ∴ /busybox-1.37.0-r12.Q1sSNCl4MTQ0d1V/0NTXAhIjY7Nqo=.post-upgrade [MEDIUM]
├─ 🛑 Deleted: 0.65.0-rootfs/lib/apk/db/scripts.tar ∴ /busybox-1.37.0-r12.Q1sSNCl4MTQ0d1V/0NTXAhIjY7Nqo=.trigger [HIGH]
├─ 🟡 Deleted: 0.65.0-rootfs/lib/apk/db/scripts.tar ∴ /ca-certificates-20250619-r0.Q1xUNRT2WUrGiLIMFZ+1e2JbKz6MQ=.post-deinstall [MEDIUM]
├─ 🛑 Deleted: 0.65.0-rootfs/lib/apk/db/scripts.tar ∴ /ca-certificates-20250619-r0.Q1xUNRT2WUrGiLIMFZ+1e2JbKz6MQ=.trigger [HIGH]
├─ 🟡 Deleted: 0.65.0-rootfs/usr/lib/libapk.so.2.14.0 [MEDIUM]
├─ 🟡 Deleted: 0.65.0-rootfs/usr/lib/libexpat.so.1.10.1 [MEDIUM]
├─ 🟡 Deleted: 0.65.0-rootfs/usr/lib/libnghttp2.so.14.28.3 [MEDIUM]
├─ 🟡 Deleted: 0.65.0-rootfs/usr/lib/libunistring.so.5.1.0 [MEDIUM]
├─ 🟡 Deleted: 0.65.0-rootfs/usr/lib/libzstd.so.1.5.6 [MEDIUM]

Ais8Ooz8 avatar Nov 25 '25 18:11 Ais8Ooz8

Oh, good call. I don't think this is something we've considered previously but I'm happy to work on improving the legibility of diffs given how much more information is displayed.

egibs avatar Nov 26 '25 01:11 egibs

Can you try the latest release (1.18.0) and see if the new output with and without --score-all improves your experience when running diffs?

I tested with the Trivy images mentioned above and the new output is much more concise, especially when using --score-all.

egibs avatar Dec 05 '25 22:12 egibs

Yes, it looks great. Regarding the flag, it says this mode is slow, but I didn't notice much of a difference. Is there a formula I can use to calculate/predict how long the analysis will take?

--score-all           Compute the Levenshtein distance for all source and destination paths (warning: experimental and slow!) (default: false)
mal diff --score-all  0.65.0-rootfs/ 0.66.0-rootfs/ 
├─ 🟡 Added: /Users/maxim/images/0.66.0-rootfs ∴ 0.66.0-rootfs/usr/bin/iconv [MEDIUM]
│     ≡ anti-static [MEDIUM]
│       🟡 binary/opaque — binary contains little text content: destination charset, source charset, write error
│
├─ 🟡 Moved: 0.65.0-rootfs/usr/bin/getent -> 0.66.0-rootfs/usr/bin/getent (score: 1.000000)
│     ≡ networking [MEDIUM]
│-      🟡 ip/host_port — connects to an arbitrary hostname:port
│
├─ 🟡 Moved: 0.65.0-rootfs/usr/lib/libcrypto.so.3 -> 0.66.0-rootfs/usr/lib/libcrypto.so.3 (score: 1.000000)
│     ≡ discovery [LOW]
│+      🔵 user/USER — Looks up the USER name of the current user: getenv, ENV
│
├─ 🛑 Moved: 0.65.0-rootfs/usr/lib/libcurl.so.4.8.0 -> 0.66.0-rootfs/usr/lib/libcurl.so.4.8.0 (score: 1.000000)
│     ≡ networking [MEDIUM]
│+      🟡 ip/icmp — Uses the ping tool to generate ICMP packets: ping response., ping request.
│
├─ 🟡 Moved: 0.65.0-rootfs/usr/lib/libssl.so.3 -> 0.66.0-rootfs/usr/lib/libssl.so.3 (score: 1.000000)
│     ≡ networking [MEDIUM]
│+      🟡 ip/addr — mentions an 'IP address'
│+      🟡 socket/pair — create a pair of connected sockets: socketpair
│-      🔵 http — Uses the HTTP protocol
│-      🟡 http/post — submits content to websites
│
├─ 🛑 Moved: 0.65.0-rootfs/usr/libexec/git-core/git-http-push -> 0.66.0-rootfs/usr/libexec/git-core/git-http-push (score: 1.000000)
│     ≡ filesystem [LOW]
│+      🔵 mount — mounts file systems
│
├─ 🟡 Moved: 0.65.0-rootfs/usr/lib/libnghttp2.so.14.28.3 -> 0.66.0-rootfs/usr/lib/libnghttp2.so.14.28.4 (score: 0.971429)
│     ≡ filesystem [LOW]
│-      🔵 file/delete — deletes files
│
├─ 🛑 Changed (1 added, 0 removed): 0.66.0-rootfs/lib/apk/db/scripts.tar ∴ /ca-certificates-20250619-r0.Q1O3wy7NQ0LRAM8EyppKJ3AolkYeM=.trigger
│     ≡ execution [MEDIUM]
│+      🟡 shell/ignore_output — Runs shell commands but throws output away: /usr/sbin/update-ca-certificates > /dev/null 2>&1
│
mal diff --score-all --image ghcr.io/aquasecurity/trivy:0.65.0 ghcr.io/aquasecurity/trivy:0.66.0
├─ 🟡 Added: ghcr.io/aquasecurity/trivy:0.66.0 ∴ /usr/bin/iconv [MEDIUM]
│     ≡ anti-static [MEDIUM]
│       🟡 binary/opaque — binary contains little text content: destination charset, source charset, write error
│
├─ 🟡 Moved: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /usr/bin/getent -> ghcr.io/aquasecurity/trivy:0.66.0 ∴ /usr/bin/getent (score: 1.000000)
│     ≡ networking [MEDIUM]
│-      🟡 ip/host_port — connects to an arbitrary hostname:port
│
├─ 🟡 Moved: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /usr/lib/libcrypto.so.3 -> ghcr.io/aquasecurity/trivy:0.66.0 ∴ /usr/lib/libcrypto.so.3 (score: 1.000000)
│     ≡ discovery [LOW]
│+      🔵 user/USER — Looks up the USER name of the current user: getenv, ENV
│
├─ 🛑 Moved: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /usr/lib/libcurl.so.4.8.0 -> ghcr.io/aquasecurity/trivy:0.66.0 ∴ /usr/lib/libcurl.so.4.8.0 (score: 1.000000)
│     ≡ networking [MEDIUM]
│+      🟡 ip/icmp — Uses the ping tool to generate ICMP packets: ping response., ping request.
│
├─ 🟡 Moved: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /usr/lib/libssl.so.3 -> ghcr.io/aquasecurity/trivy:0.66.0 ∴ /usr/lib/libssl.so.3 (score: 1.000000)
│     ≡ networking [MEDIUM]
│+      🟡 ip/addr — mentions an 'IP address'
│+      🟡 socket/pair — create a pair of connected sockets: socketpair
│-      🔵 http — Uses the HTTP protocol
│-      🟡 http/post — submits content to websites
│
├─ 🛑 Moved: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /usr/libexec/git-core/git-http-push -> ghcr.io/aquasecurity/trivy:0.66.0 ∴ /usr/libexec/git-core/git-http-push (score: 1.000000)
│     ≡ filesystem [LOW]
│+      🔵 mount — mounts file systems
│
├─ 🟡 Moved: ghcr.io/aquasecurity/trivy:0.65.0 ∴ /usr/lib/libnghttp2.so.14.28.3 -> ghcr.io/aquasecurity/trivy:0.66.0 ∴ /usr/lib/libnghttp2.so.14.28.4 (score: 0.971429)
│     ≡ filesystem [LOW]
│-      🔵 file/delete — deletes files
│
├─ 🛑 Changed (1 added, 0 removed): ghcr.io/aquasecurity/trivy:0.66.0 ∴ /lib/apk/db/scripts.tar ∴ /ca-certificates-20250619-r0.Q1O3wy7NQ0LRAM8EyppKJ3AolkYeM=.trigger
│     ≡ execution [MEDIUM]
│+      🟡 shell/ignore_output — Runs shell commands but throws output away: /usr/sbin/update-ca-certificates > /dev/null 2>&1
│

Ais8Ooz8 avatar Dec 12 '25 12:12 Ais8Ooz8

Nice! The approach is roughly quadratic (something like $$(n^2log(n))$$ complexity) so the runtime will be much more noticeable as the source and destination file counts increase.

I think the Trivy diffs were maybe a couple of dozen files each, so you're looking at $$(24*24)$$ pairs, $$(24*24)*log_2((24*24))$$ comparisons, and then $$min(24, 24)$$ matches.

egibs avatar Dec 12 '25 13:12 egibs

Here's something else I was thinking about. What if I scan different versions of images on different days? Let's say it's a vulnerability scanner like Trivy. It pulls the image, scans it, creates a report, and then deletes the image. If I want to create a mal diff of two images, I'll pull those images again. Is it possible to create a report, aka SBOM, from mal analyze and then perform a mal diff of the two SBOM files? What do you think about that?

Ais8Ooz8 avatar Dec 13 '25 14:12 Ais8Ooz8

Ingesting existing reports into a FileReport and then diffing them isn't something we currently support but I think it would be useful since the expensive portions of scanning will only need to happen once.

I'll create a backlog item for that and experiment with possible options. Closing the original Issue for now.

egibs avatar Dec 15 '25 13:12 egibs