tools-python icon indicating copy to clipboard operation
tools-python copied to clipboard

Slow for SBOMs with a large number of files + relationships

Open jonjohnsonjr opened this issue 2 years ago • 1 comments

This check iterates over the list existing_relationships up to 2 times: https://github.com/spdx/tools-python/blob/8050fd9c41a92c75ec2ba9eb10ed9a919c375fa9/src/spdx_tools/spdx/parser/jsonlikedict/relationship_parser.py#L162-L172

And it's called for every file: https://github.com/spdx/tools-python/blob/8050fd9c41a92c75ec2ba9eb10ed9a919c375fa9/src/spdx_tools/spdx/parser/jsonlikedict/relationship_parser.py#L144-L157

So if F is the number of files and R is the number of relationships, we are doing O(F * R) comparisons of relationships (which seem not cheap individually).

And given that this seems to expect to find a relationship for every file, we know that R is >= F, so this is at least O(F^2).

Looks quadratic to me.

With the caveat that I'm not a python expert, this being a list seems to be the issue: https://github.com/spdx/tools-python/blob/8050fd9c41a92c75ec2ba9eb10ed9a919c375fa9/src/spdx_tools/spdx/parser/jsonlikedict/relationship_parser.py#L55-L57

Is it possible to turn that into a set so these membership tests are O(1) instead of O(N)?

jonjohnsonjr avatar Jan 25 '24 20:01 jonjohnsonjr

As an example:

$ curl -L https://packages.wolfi.dev/os/aarch64/renovate-37.151.0-r0.apk | tar -Oxz var/lib/db/sbom/renovate-37.151.0-r0.spdx.json > sbom.spdx.json

$ wc -c sbom.spdx.json
 36350622 sbom.spdx.json

$ cat sbom.spdx.json | jq '.files[]' -c | wc -l
   30193

$ cat sbom.spdx.json | jq '.relationships[]' -c | wc -l
   30193

# 30193 * 30193 = 911617249, which is a pretty big number for python

$ time ntia-checker -v --file sbom.spdx.json

Is this SBOM NTIA minimum element conformant? True

Individual elements                            | Status
-------------------------------------------------------
All component names provided?                  | True
All component versions provided?               | True
All component identifiers provided?            | True
All component suppliers provided?              | True
SBOM author name provided?                     | True
SBOM creation timestamp provided?              | True
Dependency relationships provided?             | True

No components with missing information.

real	8m57.222s
user	8m57.118s
sys	0m0.085s

jonjohnsonjr avatar Jan 25 '24 20:01 jonjohnsonjr