HERMES produces incorrect license output when harvested metadata contains multiple licenses
When harvested metadata includes more than one license, the merging process (hermes process command) does not handle it correctly and produces malformed output.
Action: Harvest metadata from CITATION.cff and codemeta.json.
Two Scenarios:
Scenario 1 (works as expected): Both cff and codemeta contain only one license each, e.g.:
"license": [ [ "https://spdx.org/licenses/Apache-2.0", { "plugin": "cff", "local_path": "CITATION.cff", "timestamp": "2025-09-29T11:15:38.975971", "harvester": "cff" } ] ] ,
and
"license": [ [ "https://spdx.org/licenses/Apache-2.0", { "plugin": "codemeta", "local_path": "codemeta.json", "timestamp": "2025-09-29T11:15:41.690517", "harvester": "codemeta" } ] ] .
The merging works correctly in this scenario.
Scenario 2 (problematic):
- cff contains one license.
- codemeta contains multiple licenses (array).
Example:
"license": [ [ "https://spdx.org/licenses/Apache-2.0", { "plugin": "cff", "local_path": "CITATION.cff", "timestamp": "2025-09-29T11:15:38.975971", "harvester": "cff" } ] ], and"license[0]": [ [ "https://spdx.org/licenses/Apache-2.0", { "plugin": "codemeta", "local_path": "codemeta.json", "timestamp": "2025-09-29T11:15:41.690517", "harvester": "codemeta" } ] ], "license[1]": [ [ "https://spdx.org/licenses/CC-BY-4.0", { "plugin": "codemeta", "local_path": "codemeta.json", "timestamp": "2025-09-29T11:15:41.690517", "harvester": "codemeta" } ] ], "license[2]": [ [ "https://spdx.org/licenses/CC0-1.0", { "plugin": "codemeta", "local_path": "codemeta.json", "timestamp": "2025-09-29T11:15:41.690517", "harvester": "codemeta" } ] ].
The resulting hermes.json output is broken:
"license": [ "h","t","t","p","s",":","/","/","s","p","d","x",".","o","r","g","/","l","i", "c","e","n","s","e","s","/","A","p","a","c","h","e","-","2",".","0" ] .
Hi @Aidajafarbigloo, thanks for the report! I can reproduce this problem 👍🏻
Since DLR is currently refactoring the data model, we should definitely add this as a test case. I don't know what the current state of the refactoring is, so I'm not sure whether it makes sense to fix this issue now or just wait for the new data model. 🤔
Hi @zyzzyxdonta, thanks for the response.