bincapz OOM'd at 8GB of RAM with a large source tree
bincapz consumed. >8GB of RAM using it against a large directory:
[24669.706944] Out of memory: Killed process 61135 (bincapz) total-vm:8580228kB, anon-rss:2136184kB, file-rss:0kB, shmem-rss:92kB, UID:1000 pgtables:13940kB oom_score_adj:0
[24676.379361] oom_reaper: reaped process 61135 (bincapz), now anon-rss:32kB, file-rss:0kB, shmem-rss:92kB
bincapz worked on this directory a few weeks ago, though a lot has changed since then.
I added some profiling and ran bincapz against my local Downloads directory to get a decently-long run:
(pprof) sample_index = alloc_space
(pprof) top10
Showing nodes accounting for 2.96GB, 85.87% of 3.44GB total
Dropped 169 nodes (cum <= 0.02GB)
Showing top 10 nodes out of 108
flat flat% sum% cum cum%
1.48GB 43.14% 43.14% 1.74GB 50.44% io.copyBuffer
0.60GB 17.45% 60.59% 0.60GB 17.45% github.com/liamg/magic.worker
0.27GB 7.70% 68.30% 0.40GB 11.72% github.com/hillu/go-yara/v4.(*Rule).getMatchStrings
0.15GB 4.31% 72.61% 0.15GB 4.31% compress/flate.(*huffmanDecoder).init
0.14GB 3.93% 76.54% 0.14GB 3.93% github.com/rivo/uniseg.NewGraphemes
0.11GB 3.19% 79.73% 0.11GB 3.19% github.com/hillu/go-yara/v4.(*String).Matches
0.10GB 2.98% 82.71% 0.10GB 2.98% github.com/ulikunitz/xz/lzma.(*decoder).readOp
0.05GB 1.54% 84.25% 0.07GB 1.93% github.com/chainguard-dev/bincapz/pkg/report.matchStrings
0.03GB 0.81% 85.06% 0.06GB 1.60% github.com/chainguard-dev/bincapz/pkg/compile.Recursive.func1
0.03GB 0.81% 85.87% 0.08GB 2.44% github.com/chainguard-dev/bincapz/pkg/action.programKind
My first heap profile didn't yield anything interesting:
(pprof) top
Showing nodes accounting for 8919.45kB, 100% of 8919.45kB total
Showing top 10 nodes out of 38
flat flat% sum% cum cum%
3077.74kB 34.51% 34.51% 5637.82kB 63.21% github.com/chainguard-dev/bincapz/pkg/report.Generate
1536.02kB 17.22% 51.73% 1536.02kB 17.22% strings.(*Builder).grow
1184.27kB 13.28% 65.00% 1184.27kB 13.28% runtime/pprof.StartCPUProfile
1024.05kB 11.48% 76.49% 1024.05kB 11.48% fmt.Sprintf
557.26kB 6.25% 82.73% 557.26kB 6.25% github.com/hillu/go-yara/v4.(*Rule).getMatchStrings
516.01kB 5.79% 88.52% 516.01kB 5.79% runtime/pprof.(*profMap).lookup
512.05kB 5.74% 94.26% 512.05kB 5.74% path/filepath.(*lazybuf).string (inline)
512.03kB 5.74% 100% 512.03kB 5.74% github.com/hillu/go-yara/v4.(*Rule).Metas
0 0% 100% 1069.29kB 11.99% _cgoexp_e4084b5c9b87_scanCallbackFunc
0 0% 100% 7219.16kB 80.94% github.com/chainguard-dev/bincapz/pkg/action.Scan
Certainly some places where maybe I could save some space by using pointers to struct and re-use variables, but no obvious memory leak yet. I'm going to try it on a more complicated directory next.
This 100MB sample looks a little more interesting:
Showing nodes accounting for 96.79MB, 100% of 96.79MB total
Showing top 10 nodes out of 43
flat flat% sum% cum cum%
32MB 33.07% 33.07% 49.51MB 51.15% github.com/chainguard-dev/bincapz/pkg/report.Generate
30.46MB 31.47% 64.54% 80.47MB 83.14% github.com/chainguard-dev/bincapz/pkg/action.processFile
15.50MB 16.02% 80.56% 15.50MB 16.02% strings.(*Builder).grow
12MB 12.40% 92.96% 12MB 12.40% fmt.Sprintf
2.50MB 2.58% 95.54% 2.50MB 2.58% github.com/chainguard-dev/bincapz/pkg/report.matchToString
1.16MB 1.19% 96.73% 1.16MB 1.19% runtime/pprof.StartCPUProfile
1.16MB 1.19% 97.93% 1.16MB 1.19% runtime/trace.Start
1MB 1.03% 98.96% 1MB 1.03% github.com/chainguard-dev/bincapz/pkg/report.longestUnique
0.50MB 0.52% 99.48% 0.50MB 0.52% runtime/pprof.(*profMap).lookup
0.50MB 0.52% 100% 0.50MB 0.52% github.com/hillu/go-yara/v4.(*Rule).Metas
I'm making some progress in shaving down memory usage using pointers for larger structures, but I'm unconvinced that it will be enough to avoid OOM's.
I'm pretty sure the memory leak is in the cgo YARA bindings. I can get bincapz to consume 3GB of RAM, but the profile only shows 80MB of RAM usage. I've tried sending the YARA library file descriptors instead of paths, but it showed no memory improvement, and it likely carries a performance hit.
Closing as this may be obsolete, and the code base has moved around so much that the existing research isn't useful.