Should codemeta extend rather than subset schema.org?
Our current (v2.0) construction of the codemeta context file adopts an explicit subset of terms from schema.org, and adds a 10 extra terms.
If a user wants to include additional metadata about, say, an associated ScholarlyArticle, grant, etc, this means the user needs to extend the context themselves to include additional schema.org terms:
"Solution A"
{ "@context": ["https://doi.org/doi:10.5063/schema/codemeta-2.0", "http://schema.org"],
"@type": "SoftwareSourceCode",
"citation": {
"@type": "ScholarlyArticle",
"name": "A paper about this software"
}
}
Otherwise the "@type": "ScholarlyArticle" would not be recognized. This case was brought up (without resolution) in #155 , but this same issue will impact including data such as Funding information #160, and in general any extension to codemeta. Maybe the above solution (let's call that solution A) is just fine, but it feels to me that this discourages rather than encourages the above documentation.
"Solution B"
Additional terms could be added explicitly to the CodeMeta namespace on a case-by-case basis. That is not unreasonable, but obviously involves considerably more maintenance, approval process, and more releases. That seems suboptimal (Solution B)
"Solution C"
Third solution would be simply to pivot CodeMeta schema so that it is merely an extension, rather than a subset, of schema.org (i.e. schema.org + 10 terms we added that were not in schema.org). This (Solution C) would, I think, create the least friction in CodeMeta use cases (though being the most permissive, it is potentially more challenging for consumers, though I believe the consumer can merely ignore the additional terms). To do this, we would simply list our context file as [ {our new terms}, "http://schema.org"]. Another benefit (risk?) of this approach is that upstream changes in schema.org terms are automatically reflected in the codemeta context.
It feels to me that Solution C is conceptually the right approach - particular if we see CodeMeta as sitting in a larger ecosystem of description for research objects and objects related to software.
If this is the case, presumably we would need to get the CodeMeta extensions accepted through schema.org's community process?
@npch Right, I agree that adoption of our additional terms by schema.org would be the long term goal, though it is not necessary for implementing optimization C. JSON-LD makes it very easy to define your new schema as “schema.org + a few terms” and once we show consistent and significant use of our extension we are better placed to petition schema.org
I actually prefer solution A.
As a consumer it is easier to identify the contexts used than the CodeMeta terms that were added.
So having first the set of contexts used with the risk that it will expand and compact toschema:propertyseems ok to me.
In any case, we can document both options as valid codemeta.json files on https://codemeta.github.io/user-guide/.
Since there is no consensus at this stage, let's postpone this issue to v4.0 release
Solutions A and C should be amended to change the order of contexts: ["http://schema.org", "https://doi.org/doi:10.5063/schema/codemeta-2.0"] or ["http://schema.org", {our new terms}] instead of ["https://doi.org/doi:10.5063/schema/codemeta-2.0", "http://schema.org"] or [ {our new terms}, "http://schema.org"]. Order matters here because the Codemeta context doesn't use the exact same definition as schema.org, and in this case, the last context wins.
For example, author uses an @list container in Codemeta's context in order to preserve order. So this document:
{
"@context": ["https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld", "http://schema.org/"],
"author": "http://example.org/~jdoe"
}
[
{
"http://schema.org/author": [
{
"@value": "http://example.org/~jdoe"
}
]
}
]
while this document:
{
"@context": ["http://schema.org/", "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld"],
"author": "http://example.org/~jdoe"
}
[
{
"http://schema.org/author": [
{
"@list": [
{
"@value": "http://example.org/~jdoe"
}
]
}
]
}
]
With that amendment, I'm in favor of either A or C for the arguments mentioned above.
In terms of existing works, A seems to be already commonplace in the ActivityPub ecosystem. For example: https://docs.joinmastodon.org/spec/activitypub/#IdentityProof or https://forgefed.org/spec/#example-f9b2cb41
This is the only ecosystem I can think which has a custom of mixing contexts like we are proposing here; do you have any other example that goes either way?