Scanning Keycloak takes a long time
As seen here: https://github.com/wolfi-dev/os/actions/runs/10009779412/job/27673386509?pr=24126
We should see if we can profile this and determine the root cause (and make the appropriate improvements).
I opened this PR to [hopefully] help with the non-bincapz diff portions: https://github.com/wolfi-dev/wolfictl/pull/1074
I scanned all of the extracted Keycloak APKs and it took about five minutes (of course, this is on an M3 Pro versus what the four core runners that GHA is going to be using by default):
📊 Statistics
---
Files Scanned 153182 (0 skipped)
Total Risks 153182
---
⚠️ Risk Level Percentage
---
Risk Level Percentage Count/Total
2/MED 99.30% 152106/153182
1/LOW 0.68% 1044/153182
3/HIGH 0.02% 32/153182
---
Total Packages 185224
---
📦 Package Risk Percentage
---
Package Percentage Count/Total
evasion/binary/unusually_small 81.24% 150484/185224
evasion/binary/opaque 4.91% 9098/185224
ref/words/plugin 3.15% 5826/185224
databases/mysql 1.21% 2240/185224
ref/words/password 1.15% 2130/185224
ref/site/url 1.10% 2034/185224
ref/words/intercept 0.86% 1592/185224
ref/words/agent 0.58% 1082/185224
databases/postgresql 0.58% 1068/185224
fd/multiplex 0.40% 738/185224
ref/path/dev 0.34% 632/185224
secrets/private_key 0.29% 534/185224
crypto/aes 0.27% 496/185224
net/websocket 0.25% 466/185224
encoding/base64 0.23% 430/185224
net/upload 0.19% 350/185224
net/socket/connect 0.17% 306/185224
3P/threat_hunting/owasp 0.14% 262/185224
ref/path/relative 0.13% 244/185224
net/http/post 0.12% 226/185224
net/http/request 0.11% 204/185224
exec/cmd 0.09% 164/185224
ref/path/var 0.08% 156/185224
net/socket/listen 0.08% 150/185224
exec/shell_command 0.08% 146/185224
net/download 0.08% 140/185224
net/http/cookies 0.07% 138/185224
net/ip/multicast/send 0.07% 130/185224
fs/file/delete 0.06% 118/185224
net/url/encode 0.06% 114/185224
encoding/json/encode 0.06% 112/185224
net/socket/send 0.06% 110/185224
net/oauth2 0.06% 102/185224
net/url 0.05% 96/185224
encoding/json/decode 0.05% 90/185224
shell/exec 0.05% 88/185224
net/url/request 0.04% 76/185224
ref/words/exclamation 0.04% 74/185224
techniques/code_eval 0.04% 74/185224
net/http/auth 0.04% 74/185224
kernel/platform 0.04% 70/185224
fs/file/write 0.04% 66/185224
compression/gzip 0.03% 64/185224
net/http_proxy 0.03% 62/185224
crypto/ed25519 0.03% 60/185224
env/TERM 0.03% 60/185224
env/get 0.03% 56/185224
net/icmp 0.03% 56/185224
net/http/form/upload 0.03% 54/185224
fs/tempdir 0.03% 54/185224
data/embedded/html 0.03% 52/185224
ref/words/heartbeat 0.03% 50/185224
ref/path/hidden 0.03% 50/185224
ref/path/etc 0.03% 50/185224
hash/sha256 0.03% 50/185224
env/HOME 0.03% 50/185224
net/ip/send/unicast 0.03% 48/185224
process/name/get 0.03% 48/185224
fs/directory/create 0.02% 46/185224
ref/daemon 0.02% 46/185224
net/http2 0.02% 42/185224
env/TMPDIR 0.02% 42/185224
dylib/symbol/address 0.02% 40/185224
fs/file/truncate 0.02% 40/185224
net/socket/receive 0.02% 38/185224
hash/whirlpool 0.02% 36/185224
net/socks5 0.02% 36/185224
ref/words/server_address 0.02% 34/185224
data/embedded/pem/certificate 0.02% 34/185224
net/http/accept/encoding 0.02% 32/185224
net/reuseport 0.02% 32/185224
tty/pathname 0.02% 32/185224
process/chdir 0.02% 30/185224
tty/open 0.02% 28/185224
hash/blake2b 0.02% 28/185224
fs/file/stat 0.02% 28/185224
crypto/tls 0.01% 26/185224
hash/md5 0.01% 26/185224
ref/words/obfuscate 0.01% 26/185224
techniques/brute_force 0.01% 24/185224
fs/link/read 0.01% 24/185224
hash/sha1 0.01% 24/185224
procfs/mounts 0.01% 22/185224
data/embedded/pem/private_key 0.01% 22/185224
combo/recon/system_network 0.01% 22/185224
compression/zstd 0.01% 22/185224
ref/ip_port 0.01% 20/185224
fs/mounts/read 0.01% 20/185224
3P/threat_hunting/kerbrute 0.01% 20/185224
ref/program/ancient_gcc 0.01% 20/185224
ref/site/php 0.01% 18/185224
ref/path/root 0.01% 18/185224
kernel/iokit/registry 0.01% 16/185224
fs/link/create 0.01% 16/185224
random/insecure 0.01% 16/185224
fs/symlink/resolve 0.01% 16/185224
net/dns/txt 0.01% 14/185224
net/ssh 0.01% 14/185224
3P/threat_hunting/hijacker 0.01% 14/185224
fs/file/delete/forcibly 0.01% 14/185224
ref/path/usr/bin 0.01% 12/185224
shell/nohup 0.01% 12/185224
env/USER 0.01% 12/185224
compression/bzip2 0.01% 12/185224
process/username/get 0.01% 12/185224
fs/permission/modify 0.01% 12/185224
ref/path/usr/local 0.01% 12/185224
net/hostname/resolve 0.01% 12/185224
process/create 0.01% 12/185224
kernel/hostname/get 0.01% 12/185224
kernel/sysinfo 0.01% 10/185224
ref/path/home/config 0.01% 10/185224
fd/read 0.01% 10/185224
ref/path/tmp 0.01% 10/185224
fd/write 0.01% 10/185224
fs/directory/remove 0.01% 10/185224
net/bpf 0.01% 10/185224
fs/attributes/remove 0.00% 8/185224
evasion/bitwise_math 0.00% 8/185224
ref/ip 0.00% 8/185224
ref/path/etc/resolv.conf 0.00% 8/185224
fs/attributes/set 0.00% 8/185224
databases/sqlite 0.00% 8/185224
ref/extensions/office 0.00% 8/185224
3P/threat_hunting/dumpert 0.00% 8/185224
ref/path/usr/sbin 0.00% 6/185224
fs/file/times/set 0.00% 6/185224
time/tzinfo 0.00% 6/185224
env/SHELL 0.00% 6/185224
net/sendfile 0.00% 6/185224
shell/pipe_sh 0.00% 6/185224
fd/epoll 0.00% 6/185224
persist/bash 0.00% 6/185224
fs/file/read 0.00% 6/185224
fs/node/create 0.00% 6/185224
net/mac/address 0.00% 6/185224
kernel/acct 0.00% 6/185224
fs/permission/get 0.00% 6/185224
service/systemd 0.00% 6/185224
mem/anonymous/file 0.00% 4/185224
ref/path/var/run 0.00% 4/185224
net/socket/local/address 0.00% 4/185224
crypto/gost89 0.00% 4/185224
kernel/cpu/info 0.00% 4/185224
ref/program/powershell 0.00% 4/185224
3P/threat_hunting/redpeanut 0.00% 4/185224
data/embedded/base64/url 0.00% 4/185224
process/unshare 0.00% 4/185224
ref/words/dropper 0.00% 4/185224
process/list 0.00% 4/185224
crypto/file/encrypter 0.00% 4/185224
process/chroot 0.00% 4/185224
ref/path/etc/hosts 0.00% 4/185224
3P/threat_hunting/userenum 0.00% 4/185224
procfs/self/status 0.00% 4/185224
combo/stealer/office 0.00% 4/185224
exec/pipe 0.00% 4/185224
evasion/fake/library 0.00% 4/185224
crypto/mining/monero/pool 0.00% 4/185224
ref/site/http/dynamic 0.00% 4/185224
net/socket/peer/address 0.00% 4/185224
dylib/address/check 0.00% 4/185224
3P/threat_hunting/hardhatc2 0.00% 4/185224
net/stat 0.00% 2/185224
net/fetch 0.00% 2/185224
procfs/meminfo 0.00% 2/185224
ref/words/password_finder 0.00% 2/185224
net/vnc 0.00% 2/185224
kernel/dev/block/device 0.00% 2/185224
ref/words/exploit 0.00% 2/185224
ref/path/file/url 0.00% 2/185224
archives/zip 0.00% 2/185224
ref/path/users 0.00% 2/185224
process/userid/set 0.00% 2/185224
fs/mount 0.00% 2/185224
secrets/ssh 0.00% 2/185224
ref/words/decryptor 0.00% 2/185224
3P/threat_hunting/runascs 0.00% 2/185224
net/ip 0.00% 2/185224
procfs/self/exe 0.00% 2/185224
ref/ip/dns_resolver 0.00% 2/185224
process/groupid/set 0.00% 2/185224
fs/tempdir/create 0.00% 2/185224
service/stop 0.00% 2/185224
net/ip/parse 0.00% 2/185224
shell/background_launcher 0.00% 2/185224
net/dns/reverse 0.00% 2/185224
3P/threat_hunting/bruteratel 0.00% 2/185224
kernel/hostname/set 0.00% 2/185224
env/TEMP 0.00% 2/185224
persist/pid_file 0.00% 2/185224
fs/tempfile/create 0.00% 2/185224
device/disk/info 0.00% 2/185224
env/LANG 0.00% 2/185224
admin/package/install 0.00% 2/185224
service/start 0.00% 2/185224
shell/ignore_output 0.00% 2/185224
ref/words/infected 0.00% 2/185224
3P/threat_hunting/netstat 0.00% 2/185224
ref/path/home 0.00% 2/185224
net/tcp/state_tracker 0.00% 2/185224
persist/crontab 0.00% 2/185224
________________________________________________________
Executed in 294.10 secs fish external
usr time 202.23 secs 72.00 micros 202.23 secs
sys time 118.10 secs 880.00 micros 118.10 secs
I'm wondering if my original scan was an outlier -- even with the new concurrency changes the scans are taking ~8-9 minutes. I'll keep investigating.
Part of the slowdown seems to be from avoiding Goroutines when determining the programkind. Here are the current differences between concurrency and no concurrency when using magic.Lookup (concurrent), respectively:
________________________________________________________
Executed in 371.00 secs fish external
usr time 521.76 secs 0.07 millis 521.76 secs
sys time 235.64 secs 1.56 millis 235.63 secs
________________________________________________________
Executed in 535.85 secs fish external
usr time 461.71 secs 0.10 millis 461.71 secs
sys time 127.41 secs 3.54 millis 127.41 secs
Without concurrency (magic.LookupSync), the scans take about 3 minutes longer.
Okay, found the performance regression:
var pathKeys []string
results.Range(func(key, value interface{}) bool {
if k, ok := key.(string); ok {
func(key string) {
pathKeys = append(pathKeys, key)
slices.Sort(pathKeys)
}(k)
value, ok := value.(*bincapz.FileReport)
if !ok {
return false
}
scanPathFindings[k] = value
}
return true
})
We're sorting the pathKeys slice each time we iterate over the map which is super expensive. I'll open a PR to fix this.
With that fixed, the concurrent run is about 3x faster than what I originally posted:
________________________________________________________
Executed in 89.45 secs fish external
usr time 236.23 secs 0.11 millis 236.23 secs
sys time 241.30 secs 3.46 millis 241.30 secs
Using magic.LookupSync also saves some time rather than taking longer as I saw above:
________________________________________________________
Executed in 84.25 secs fish external
usr time 224.74 secs 0.10 millis 224.74 secs
sys time 223.76 secs 2.89 millis 223.76 secs
I made some tweaks/fixes related to this in #424 as well.
To clarify, the diff for Keycloak is what takes a long time, not the default scan. We're seeing this with the Spark package which, when completely extracted, comes in at ~760,000 files.
I'll close this out in favor of a "diff woes" Issue.