feat: add support for encoding hashes in CycloneDX format
Description
add support for encoding hashes in CycloneDX format
https://cyclonedx.org/docs/1.6/json/#components_items_hashes
change
{
"bom-ref": "pkg:maven/net.sf.saxon.Transform/[email protected]?package-id=defe654c765bea5e",
"type": "library",
"name": "Saxon-HE",
"version": "12.5",
"cpe": "cpe:2.3:a:net.sf.saxon.Transform:Transform:12.5:*:*:*:*:*:*:*",
"purl": "pkg:maven/net.sf.saxon.Transform/[email protected]",
"externalReferences": [
{
"url": "",
"hashes": [
{
"alg": "SHA-1",
"content": "57c007520e2879387b8d13d0a512e9566eeffa73"
}
],
"type": "build-meta"
}
],
"properties": [...]
},
to
{
"bom-ref": "pkg:maven/net.sf.saxon.Transform/[email protected]?package-id=defe654c765bea5e",
"type": "library",
"name": "Saxon-HE",
"version": "12.5",
"hashes": [
{
"alg": "SHA-1",
"content": "57c007520e2879387b8d13d0a512e9566eeffa73"
}
],
"cpe": "cpe:2.3:a:net.sf.saxon.Transform:Transform:12.5:*:*:*:*:*:*:*",
"purl": "pkg:maven/net.sf.saxon.Transform/[email protected]",
"properties": [...]
},
Type of change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (please discuss with the team first; Syft is 1.0 software and we won't accept breaking changes without going to 2.0)
- [ ] Documentation (updates the documentation)
- [ ] Chore (improve the developer experience, fix a test flake, etc, without changing the visible behavior of Syft)
- [ ] Performance (make Syft run faster or use less memory, without changing visible behavior much)
Checklist:
- [x] I have added unit tests that cover changed behavior
- [x] I have tested my code in common scenarios and confirmed there are no regressions
- [x] I have added comments to my code, particularly in hard-to-understand sections
It looks like the failing tests are because not all checksums will pass schema verification:
{
"name": "syft:metadata:pullChecksum",
"value": "Q1p78yvTLG094tHE1+dToJGbmYzQE="
},
on the metadata is being encoded as
{
"bom-ref": "97a82cfa116f3277",
"type": "library",
"publisher": "Natanael Copa <[email protected]>",
"name": "libc-utils",
"version": "0.7.2-r0",
"description": "Meta package to pull in correct libc",
"hashes": [
{
"alg": "SHA-256",
"content": "Q1p78yvTLG094tHE1+dToJGbmYzQE="
}
],
which is not right according to the JSON schema:
Must match regular expression: ^([a-fA-F0-9]{32}|[a-fA-F0-9]{40}|[a-fA-F0-9]{64}|[a-fA-F0-9]{96}|[a-fA-F0-9]{128})$
I think there is room for making syft better here. We are always type asserting metadata's to get specific hashes out, what we could be doing instead is make a new interface that returns []file.Digest and type assert to the new interface. This way we get clarity on both the value and algorithm so that we can do format verification more easily while adding a natural extension point for this kind of behavior in the future.
We might want to distinct between hashes that are claimed by the package manager vs actually captured, a quality that is implicit today but could be helpful downstream (but probably not needed now).