malcontent bincapz OOM'd at 8GB of RAM with a large source tree

bincapz consumed. >8GB of RAM using it against a large directory:

[24669.706944] Out of memory: Killed process 61135 (bincapz) total-vm:8580228kB, anon-rss:2136184kB, file-rss:0kB, shmem-rss:92kB, UID:1000 pgtables:13940kB oom_score_adj:0
[24676.379361] oom_reaper: reaped process 61135 (bincapz), now anon-rss:32kB, file-rss:0kB, shmem-rss:92kB

bincapz worked on this directory a few weeks ago, though a lot has changed since then.

May 08 '24 21:05 tstromberg

I added some profiling and ran bincapz against my local Downloads directory to get a decently-long run:

(pprof) sample_index = alloc_space
(pprof) top10
Showing nodes accounting for 2.96GB, 85.87% of 3.44GB total
Dropped 169 nodes (cum <= 0.02GB)
Showing top 10 nodes out of 108
      flat  flat%   sum%        cum   cum%
    1.48GB 43.14% 43.14%     1.74GB 50.44%  io.copyBuffer
    0.60GB 17.45% 60.59%     0.60GB 17.45%  github.com/liamg/magic.worker
    0.27GB  7.70% 68.30%     0.40GB 11.72%  github.com/hillu/go-yara/v4.(*Rule).getMatchStrings
    0.15GB  4.31% 72.61%     0.15GB  4.31%  compress/flate.(*huffmanDecoder).init
    0.14GB  3.93% 76.54%     0.14GB  3.93%  github.com/rivo/uniseg.NewGraphemes
    0.11GB  3.19% 79.73%     0.11GB  3.19%  github.com/hillu/go-yara/v4.(*String).Matches
    0.10GB  2.98% 82.71%     0.10GB  2.98%  github.com/ulikunitz/xz/lzma.(*decoder).readOp
    0.05GB  1.54% 84.25%     0.07GB  1.93%  github.com/chainguard-dev/bincapz/pkg/report.matchStrings
    0.03GB  0.81% 85.06%     0.06GB  1.60%  github.com/chainguard-dev/bincapz/pkg/compile.Recursive.func1
    0.03GB  0.81% 85.87%     0.08GB  2.44%  github.com/chainguard-dev/bincapz/pkg/action.programKind

CleanShot 2024-05-08 at 18 43 48@2x

May 08 '24 23:05 egibs

My first heap profile didn't yield anything interesting:

(pprof) top
Showing nodes accounting for 8919.45kB, 100% of 8919.45kB total
Showing top 10 nodes out of 38
      flat  flat%   sum%        cum   cum%
 3077.74kB 34.51% 34.51%  5637.82kB 63.21%  github.com/chainguard-dev/bincapz/pkg/report.Generate
 1536.02kB 17.22% 51.73%  1536.02kB 17.22%  strings.(*Builder).grow
 1184.27kB 13.28% 65.00%  1184.27kB 13.28%  runtime/pprof.StartCPUProfile
 1024.05kB 11.48% 76.49%  1024.05kB 11.48%  fmt.Sprintf
  557.26kB  6.25% 82.73%   557.26kB  6.25%  github.com/hillu/go-yara/v4.(*Rule).getMatchStrings
  516.01kB  5.79% 88.52%   516.01kB  5.79%  runtime/pprof.(*profMap).lookup
  512.05kB  5.74% 94.26%   512.05kB  5.74%  path/filepath.(*lazybuf).string (inline)
  512.03kB  5.74%   100%   512.03kB  5.74%  github.com/hillu/go-yara/v4.(*Rule).Metas
         0     0%   100%  1069.29kB 11.99%  _cgoexp_e4084b5c9b87_scanCallbackFunc
         0     0%   100%  7219.16kB 80.94%  github.com/chainguard-dev/bincapz/pkg/action.Scan

Certainly some places where maybe I could save some space by using pointers to struct and re-use variables, but no obvious memory leak yet. I'm going to try it on a more complicated directory next.

May 10 '24 15:05 tstromberg

profile001

This 100MB sample looks a little more interesting:

Showing nodes accounting for 96.79MB, 100% of 96.79MB total
Showing top 10 nodes out of 43
      flat  flat%   sum%        cum   cum%
      32MB 33.07% 33.07%    49.51MB 51.15%  github.com/chainguard-dev/bincapz/pkg/report.Generate
   30.46MB 31.47% 64.54%    80.47MB 83.14%  github.com/chainguard-dev/bincapz/pkg/action.processFile
   15.50MB 16.02% 80.56%    15.50MB 16.02%  strings.(*Builder).grow
      12MB 12.40% 92.96%       12MB 12.40%  fmt.Sprintf
    2.50MB  2.58% 95.54%     2.50MB  2.58%  github.com/chainguard-dev/bincapz/pkg/report.matchToString
    1.16MB  1.19% 96.73%     1.16MB  1.19%  runtime/pprof.StartCPUProfile
    1.16MB  1.19% 97.93%     1.16MB  1.19%  runtime/trace.Start
       1MB  1.03% 98.96%        1MB  1.03%  github.com/chainguard-dev/bincapz/pkg/report.longestUnique
    0.50MB  0.52% 99.48%     0.50MB  0.52%  runtime/pprof.(*profMap).lookup
    0.50MB  0.52%   100%     0.50MB  0.52%  github.com/hillu/go-yara/v4.(*Rule).Metas

May 10 '24 15:05 tstromberg

I'm making some progress in shaving down memory usage using pointers for larger structures, but I'm unconvinced that it will be enough to avoid OOM's.

May 10 '24 16:05 tstromberg

I'm pretty sure the memory leak is in the cgo YARA bindings. I can get bincapz to consume 3GB of RAM, but the profile only shows 80MB of RAM usage. I've tried sending the YARA library file descriptors instead of paths, but it showed no memory improvement, and it likely carries a performance hit.

May 10 '24 19:05 tstromberg

Closing as this may be obsolete, and the code base has moved around so much that the existing research isn't useful.

Sep 15 '24 21:09 tstromberg