sbom-utility icon indicating copy to clipboard operation
sbom-utility copied to clipboard

Utility is passing an SBOM with an invalid iri-reference

Open nigellh opened this issue 1 year ago • 3 comments

Unfortunately I cannot give the SBOM, but it would be pretty easy to create.

I had an SBOM that would not load into Dependency Track with a Schema Validation error. Using the latest version of this tool I ran the validation against it and this is the output:

Welcome to the sbom-utility! Version `v0.17.0` (sbom-utility) (darwin/amd64)
============================================================================
[INFO] Loading (embedded) default schema config file: `config.json`...
[INFO] Loading (embedded) default license policy file: `license.json`...
[INFO] Attempting to load and unmarshal data from: `nps_saas_11.2.3.4_20241223_191016-collected-EDITOR/nps_saas_11.2.3.4_20241223_191016-collected-original-sbom.cdx.json`...
[INFO] Successfully unmarshalled data from: `nps_saas_11.2.3.4_20241223_191016-collected-EDITOR/nps_saas_11.2.3.4_20241223_191016-collected-original-sbom.cdx.json`
[INFO] Determining file's BOM format and version...
[INFO] Determined BOM format, version (variant): `CycloneDX`, `1.4` (latest)
[INFO] Matching BOM schema (for validation): schema/cyclonedx/1.4/bom-1.4.schema.json
[INFO] Loading schema `schema/cyclonedx/1.4/bom-1.4.schema.json`...
[INFO] Schema `schema/cyclonedx/1.4/bom-1.4.schema.json` loaded.
[INFO] Validating `nps_saas_11.2.3.4_20241223_191016-collected-EDITOR/nps_saas_11.2.3.4_20241223_191016-collected-original-sbom.cdx.json`...
[INFO] BOM valid against JSON schema: `true`

Short version of several hours of work is I tracked it to an entry. The entry in error is as follows:

    {
      "type": "library",
      "bom-ref": "pkg:pypi/[email protected]",
      "supplier": {
        "url": [
          "Not Found"
        ]
      },
      "author": "UNKNOWN",
      "name": "example",
      "version": "21.12",
      "description": "UNKNOWN",
      "licenses": [
        {
          "license": {
            "id": "Apache-2.0"
          }
        }
      ],
      "copyright": "No copyright found",
      "purl": "pkg:pypi/[email protected]",
      "properties": [
        {
          "name": "Relationship Completeness",
          "value": "Unknown"
        }
      ]
    },

This is the fixed one:


    {
      "type": "library",
      "bom-ref": "pkg:pypi/[email protected]",
      "supplier": {
        "url": [
          ""
        ]
      },
      "author": "UNKNOWN",
      "name": "example",
      "version": "21.12",
      "description": "UNKNOWN",
      "licenses": [
        {
          "license": {
            "id": "Apache-2.0"
          }
        }
      ],
      "copyright": "No copyright found",
      "purl": "pkg:pypi/[email protected]",
      "properties": [
        {
          "name": "Relationship Completeness",
          "value": "Unknown"
        }
      ]
    },

The difference is in this section:

      "supplier": {
        "url": [
          "Not Found"
        ]
      },

Looking at the spec https://cyclonedx.org/docs/1.5/json/#components_items_supplier_url it clearly states that it needs to be a URL. (Or several of them)

Could the validation tool please be updated to validate this field properly. Thanks.

nigellh avatar Dec 24 '24 15:12 nigellh

@nigellh the problem is not with the utility code proper (i.e., code that I control), but rather the fact that even the most popular json schema validation libraries (including the 2 that I have tried over time in the utility) do not support the iri-reference string format.

even when it does "support" the iri-reference string format (as is used in the CDX schema for the url field) the built-in behavior is to "pass" (not error).

... TLDR

potential fixes

  • custom validation -To fix this and "other" validation that needs to be performed apart from raw schema validation I had prototyped a "custom" validation path (where I had intended to enable "pluggable" programmatic checks; however, this approach still felt "cludgy" and I left it in an experimental state.
  • custom schema - it may be possible to alter the JSON schema for CDX to instead provide regex. for an iri-reference format (or close to it) and apply it to the url element; this might actually be supported by the ref. JSON schema validation libs. (need to try it).
    • simpler pattern matching - hardcode a string pattern for specific URL e.g., "http://my.site.com", that is, use a stricter "pattern" for string validation (need to try)
  • ** - try url-reference or url (replacement) as a "longshot" which would disallow non-ASCII chars. (need to try). In fact, perhaps url-reference (for this spec. field url might actually be the better choice for the spec.), but forward-thinking iri-reference is better in the long run.

additional notes

  • The query command can be used to search the BOM for specific values such as "Not found" at specific places such as the "supplier.url" field. This could be used in-tandem with the base validation...
    • in fact, this is EXACTLY what the experimental validation was doing internally by enforcing element or value "checks" specified using regex. in a custom config file...

mrutkows avatar Jan 03 '25 16:01 mrutkows

Some interesting reading on URL validation as it seems always to be problematic with regard to schema validation... might provide ideas towards representing.solving the problem using a general use case:

  • https://www.stephenlewis.me/blog/json-schema-url-validation/

mrutkows avatar Jan 03 '25 17:01 mrutkows

Updated the title to better reflect the situation.

mrutkows avatar Jul 30 '25 15:07 mrutkows