Utility is passing an SBOM with an invalid iri-reference
Unfortunately I cannot give the SBOM, but it would be pretty easy to create.
I had an SBOM that would not load into Dependency Track with a Schema Validation error. Using the latest version of this tool I ran the validation against it and this is the output:
Welcome to the sbom-utility! Version `v0.17.0` (sbom-utility) (darwin/amd64)
============================================================================
[INFO] Loading (embedded) default schema config file: `config.json`...
[INFO] Loading (embedded) default license policy file: `license.json`...
[INFO] Attempting to load and unmarshal data from: `nps_saas_11.2.3.4_20241223_191016-collected-EDITOR/nps_saas_11.2.3.4_20241223_191016-collected-original-sbom.cdx.json`...
[INFO] Successfully unmarshalled data from: `nps_saas_11.2.3.4_20241223_191016-collected-EDITOR/nps_saas_11.2.3.4_20241223_191016-collected-original-sbom.cdx.json`
[INFO] Determining file's BOM format and version...
[INFO] Determined BOM format, version (variant): `CycloneDX`, `1.4` (latest)
[INFO] Matching BOM schema (for validation): schema/cyclonedx/1.4/bom-1.4.schema.json
[INFO] Loading schema `schema/cyclonedx/1.4/bom-1.4.schema.json`...
[INFO] Schema `schema/cyclonedx/1.4/bom-1.4.schema.json` loaded.
[INFO] Validating `nps_saas_11.2.3.4_20241223_191016-collected-EDITOR/nps_saas_11.2.3.4_20241223_191016-collected-original-sbom.cdx.json`...
[INFO] BOM valid against JSON schema: `true`
Short version of several hours of work is I tracked it to an entry. The entry in error is as follows:
{
"type": "library",
"bom-ref": "pkg:pypi/[email protected]",
"supplier": {
"url": [
"Not Found"
]
},
"author": "UNKNOWN",
"name": "example",
"version": "21.12",
"description": "UNKNOWN",
"licenses": [
{
"license": {
"id": "Apache-2.0"
}
}
],
"copyright": "No copyright found",
"purl": "pkg:pypi/[email protected]",
"properties": [
{
"name": "Relationship Completeness",
"value": "Unknown"
}
]
},
This is the fixed one:
{
"type": "library",
"bom-ref": "pkg:pypi/[email protected]",
"supplier": {
"url": [
""
]
},
"author": "UNKNOWN",
"name": "example",
"version": "21.12",
"description": "UNKNOWN",
"licenses": [
{
"license": {
"id": "Apache-2.0"
}
}
],
"copyright": "No copyright found",
"purl": "pkg:pypi/[email protected]",
"properties": [
{
"name": "Relationship Completeness",
"value": "Unknown"
}
]
},
The difference is in this section:
"supplier": {
"url": [
"Not Found"
]
},
Looking at the spec https://cyclonedx.org/docs/1.5/json/#components_items_supplier_url it clearly states that it needs to be a URL. (Or several of them)
Could the validation tool please be updated to validate this field properly. Thanks.
@nigellh the problem is not with the utility code proper (i.e., code that I control), but rather the fact that even the most popular json schema validation libraries (including the 2 that I have tried over time in the utility) do not support the iri-reference string format.
even when it does "support" the iri-reference string format (as is used in the CDX schema for the url field) the built-in behavior is to "pass" (not error).
... TLDR
potential fixes
- custom validation -To fix this and "other" validation that needs to be performed apart from raw schema validation I had prototyped a "custom" validation path (where I had intended to enable "pluggable" programmatic checks; however, this approach still felt "cludgy" and I left it in an experimental state.
-
custom schema - it may be possible to alter the JSON schema for CDX to instead provide regex. for an
iri-referenceformat (or close to it) and apply it to theurlelement; this might actually be supported by the ref. JSON schema validation libs. (need to try it).- simpler pattern matching - hardcode a string pattern for specific URL e.g., "http://my.site.com", that is, use a stricter "pattern" for string validation (need to try)
- ** - try
url-referenceorurl(replacement) as a "longshot" which would disallow non-ASCII chars. (need to try). In fact, perhapsurl-reference(for this spec. fieldurlmight actually be the better choice for the spec.), but forward-thinkingiri-referenceis better in the long run.
additional notes
- The
querycommand can be used to search the BOM for specific values such as "Not found" at specific places such as the "supplier.url" field. This could be used in-tandem with the base validation...- in fact, this is EXACTLY what the experimental validation was doing internally by enforcing element or value "checks" specified using regex. in a custom config file...
Some interesting reading on URL validation as it seems always to be problematic with regard to schema validation... might provide ideas towards representing.solving the problem using a general use case:
- https://www.stephenlewis.me/blog/json-schema-url-validation/
Updated the title to better reflect the situation.