Permissive Parsing and Phenopacket Versions
https://github.com/exomiser/Exomiser/blob/54a50c8d09b1d59839c24f2749aa8ff4627a7959/exomiser-core/src/main/java/org/monarchinitiative/exomiser/core/proto/ProtoParser.java#L92
Hi Jules,
Daniel and I have found in our analysis pipeline with exomiser that we are allowing v2 phenopackets through the command-line runner with permissive parsing. With the schema change from v1 "negated" is renamed to "excluded". Exomiser still runs but will allow all phenotypes through since the filtering step is looking to filter negated terms not excluded terms. This leads to incorrect analysis because negated/excluded terms are interpreted as present.
One possible solution is to upgrade the phenopacket parser to attempt to parse v2 phenopackets not permissively then on failure try to parse v1 phenopackets. Otherwise failing hard. Of course this is your call on how to move forward with parsing.
We might want to also consider reving to v2 phenopackets for 14.0.
Below is a test phenopacket when looking at the parsing.
{
"id": "phenopacket-id",
"subject": {
"id": "subject-id"
},
"phenotypicFeatures": [{
"type": {
"id": "HP:0008066",
"label": "Abnormal blistering of the skin"
},
"excluded": true
}, {
"type": {
"id": "HP:0040189",
"label": "Scaling skin"
}
}],
"metaData": {
"created": "2022-01-23T11:11:11.000000Z",
"createdBy": "phenopacket-wizard",
"resources": [{
"id": "hp",
"name": "human phenotype ontology",
"url": "http://purl.obolibrary.org/obo/hp.owl",
"version": "http://purl.obolibrary.org/obo/hp/releases/2022-06-11/hp.json",
"namespacePrefix": "HP",
"iriPrefix": "http://purl.obolibrary.org/obo/HP_"
}],
"phenopacketSchemaVersion": "2.0"
}
}
An example to demonstrate the mishap (can be used in ProtoParserTest):
This code parsers the above phenopacket:
@Test
void readV1PhenopacketJson() {
// Path to the phenopacket from Mike's post
Phenopacket phenopacket = parsePhenopacket(Paths.get("example.json"));
assertThat(phenopacket.getPhenotypicFeaturesCount(), equalTo(2)); // passes
long positivePhenotypicFeaturesCount = phenopacket.getPhenotypicFeaturesList().stream()
.filter(pf -> !pf.getNegated())
.count();
// Fails because the `excluded` field from the v2 phenopacket is ignored,
// changing the `Abnormal blistering of the skin` to a present term despite
// that it should have been omitted from the analysis.
assertThat(positivePhenotypicFeaturesCount, equalTo(1L));
}
Thanks for the feedback Mike and Daniel. I'll add this to 14.0.0.