syft icon indicating copy to clipboard operation
syft copied to clipboard

feat: add support for encoding hashes in CycloneDX format

Open JoshuaChen opened this issue 11 months ago • 1 comments

Description

add support for encoding hashes in CycloneDX format

https://cyclonedx.org/docs/1.6/json/#components_items_hashes

change

    {
      "bom-ref": "pkg:maven/net.sf.saxon.Transform/[email protected]?package-id=defe654c765bea5e",
      "type": "library",
      "name": "Saxon-HE",
      "version": "12.5",
      "cpe": "cpe:2.3:a:net.sf.saxon.Transform:Transform:12.5:*:*:*:*:*:*:*",
      "purl": "pkg:maven/net.sf.saxon.Transform/[email protected]",
      "externalReferences": [
        {
          "url": "",
          "hashes": [
            {
              "alg": "SHA-1",
              "content": "57c007520e2879387b8d13d0a512e9566eeffa73"
            }
          ],
          "type": "build-meta"
        }
      ],
      "properties": [...]
    },

to

    {
      "bom-ref": "pkg:maven/net.sf.saxon.Transform/[email protected]?package-id=defe654c765bea5e",
      "type": "library",
      "name": "Saxon-HE",
      "version": "12.5",
      "hashes": [
        {
          "alg": "SHA-1",
          "content": "57c007520e2879387b8d13d0a512e9566eeffa73"
        }
      ],
      "cpe": "cpe:2.3:a:net.sf.saxon.Transform:Transform:12.5:*:*:*:*:*:*:*",
      "purl": "pkg:maven/net.sf.saxon.Transform/[email protected]",
      "properties": [...]
    },

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (please discuss with the team first; Syft is 1.0 software and we won't accept breaking changes without going to 2.0)
  • [ ] Documentation (updates the documentation)
  • [ ] Chore (improve the developer experience, fix a test flake, etc, without changing the visible behavior of Syft)
  • [ ] Performance (make Syft run faster or use less memory, without changing visible behavior much)

Checklist:

  • [x] I have added unit tests that cover changed behavior
  • [x] I have tested my code in common scenarios and confirmed there are no regressions
  • [x] I have added comments to my code, particularly in hard-to-understand sections

JoshuaChen avatar May 13 '25 08:05 JoshuaChen

It looks like the failing tests are because not all checksums will pass schema verification:

        {
          "name": "syft:metadata:pullChecksum",
          "value": "Q1p78yvTLG094tHE1+dToJGbmYzQE="
        },

on the metadata is being encoded as

    {
      "bom-ref": "97a82cfa116f3277",
      "type": "library",
      "publisher": "Natanael Copa <[email protected]>",
      "name": "libc-utils",
      "version": "0.7.2-r0",
      "description": "Meta package to pull in correct libc",
      "hashes": [
        {
          "alg": "SHA-256",
          "content": "Q1p78yvTLG094tHE1+dToJGbmYzQE="
        }
      ],

which is not right according to the JSON schema:

Must match regular expression: ^([a-fA-F0-9]{32}|[a-fA-F0-9]{40}|[a-fA-F0-9]{64}|[a-fA-F0-9]{96}|[a-fA-F0-9]{128})$

I think there is room for making syft better here. We are always type asserting metadata's to get specific hashes out, what we could be doing instead is make a new interface that returns []file.Digest and type assert to the new interface. This way we get clarity on both the value and algorithm so that we can do format verification more easily while adding a natural extension point for this kind of behavior in the future.

We might want to distinct between hashes that are claimed by the package manager vs actually captured, a quality that is implicit today but could be helpful downstream (but probably not needed now).

wagoodman avatar May 14 '25 13:05 wagoodman