[BUG] SBOM generation for CycloneDX generates duplicate dependencies
Is there an existing issue for this?
- [X] I have searched the existing issues
This issue exists in the latest npm version
- [X] I am using the latest npm
Current Behavior
The generated CycloneDX SBOM may not be able to be parsed by tools, as it generates duplicate dependencies.
Expected Behavior
A CycloneDX v1.5 SBOM generated from a repository can be parsed correctly.
Steps To Reproduce
- Clone https://gitlab.com/tanna.dev/renovate-graph
- Run
npm sbom --sbom-format cyclonedx > cyclonedx.json - Run through an Cyclone validator i.e.
go run github.com/CycloneDX/sbom-utility@latest validate --input-file cyclonedx.json
Environment
- npm: 10.2.3
- Node.js: v18.17.1
- OS Name: Linux
- System Model Name:
- npm config:
; "user" config from /home/jamie/.npmrc
//registry.npmjs.org/:_authToken = (protected)
; node bin location = /usr/bin/node
; node version = v18.17.1
; npm local prefix = /home/jamie/workspaces/renovate-graph
; npm version = 10.2.3
; cwd = /home/jamie/workspaces/renovate-graph
; HOME = /home/jamie
; Run `npm config ls -l` to show all defaults.
did you experience the same issue when generating the SBOM via official tooling https://github.com/CycloneDX/cyclonedx-node-npm ?
@bdehamer see my earlier remarks related to intrinsic impossible deduplication in node_modules: https://github.com/npm/rfcs/pull/714#issuecomment-1672927160
@jamietanna I'm digging into this issue and considering a couple different solutions. I'd be curious to hear which of these best meets the need of your SBOM use cases . . .
The Issue
In certain circumstances, it is not possible for npm to completely deduplicate packages in the node_modules tree. A basic example would be something like this:
[email protected]
├─┬ [email protected]
│ └── [email protected]
├─┬ [email protected]
│ └── [email protected]
└── [email protected]
My demo-package project has dependencies on foo, bar and tslib (version 2.6.2). Since foo and bar each have a dependency on an older version of tslib (version 1.14.1) that is in conflict with the version needed by the root project, [email protected] cannot be hoisted to top of the node_modules and ends-up being duplicated under both foo and bar.
Since version 1.14.1 of tslib literally appears on-disk at two different locations in the tree, the somewhat naive SBOM generator ends-up adding two identical entries to the CycloneDX components list.
This is why the resulting SBOM fails validation -- we end up with multiple entries which have identical bom-ref values.
Solution 1
One way to address this would be to treat each package that appears in the tree as a distinct dependency -- even if it is technically identical to some other dependency already present in the tree.
Given the example above, this solution would result in [email protected] being listed twice in the SBOM, albeit with a distinct bom-ref value. We might choose to do something like prefix the bom-ref name the parent package name resulting in entries that look something like:
[
{
"bom-ref": "[email protected]@1.14.1",
"type": "library",
"name": "tslib",
"version": "1.14.1",
},
{
"bom-ref": "[email protected]@1.14.1",
"type": "library",
"name": "tslib",
"version": "1.14.1",
}
]
I believe that this is similar to the how cyclonedx-node-npm solves this problem.
Solution 2
The other approach would be to deduplicate that packages before adding them to the SBOM. Instead of literally mirroring the layout of packages in the node_modules directory, this solution would detect the multiple instances of [email protected] and fold them into a single entry in the SBOM components list:
[
{
"bom-ref": "[email protected]",
"type": "library",
"name": "tslib",
"version": "1.14.1",
}
]
In this case, we're not trying to represent the layout of the node_modules directory, but instead just enumerating the distinct dependencies that comprise the project. This is how both cdxgen and the snyk SBOM command handle the issue of duplicate packages.
I think there are cases to be made for either of these solutions, but I'd like to know which of these best matches the output you'd expect to see in a valid SBOM?
It's been a while - but I'd strongly vote for option 2. If a technical identical dependency is included multiple times, it should only appear ONCE as a component in the SBOM. It can be referenced multiple times in the dependency section of the sbom though. Compare to what maven is also doing.
We use OWASP Dependency-Track and just updated it to v4.11.1. It now validates uploaded SBOMs and rejects those generated by npm:
[2024-09-16T04:37:17.479Z] [DependencyTrack] {"status":400,"title":"The uploaded BOM is invalid","detail":"Schema validation failed","errors":["$.components[320].externalReferences[2].url: does not match the iri-reference pattern must be a valid RFC 3987 IRI-reference","$.components[320].externalReferences[2].url: does not match the iri-reference pattern must be a valid RFC 3987 IRI-reference","$.components[320].externalReferences[2].url: does not match the regex pattern ^urn:cdx:[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/[1-9][0-9]*$","$.components[320].externalReferences[2].url: does not match the iri-reference pattern must be a valid RFC 3987 IRI-reference","$.components[320].externalReferences[2].url: does not match the regex pattern ^urn:cdx:[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/[1-9][0-9]*#.+$","$.dependencies: the items in the array must be unique"]}
It sounds like the problem is duplicate entries.
The official tooling does not exhibit this problem.