Exomiser icon indicating copy to clipboard operation
Exomiser copied to clipboard

Permissive Parsing and Phenopacket Versions

Open iimpulse opened this issue 3 years ago • 2 comments

https://github.com/exomiser/Exomiser/blob/54a50c8d09b1d59839c24f2749aa8ff4627a7959/exomiser-core/src/main/java/org/monarchinitiative/exomiser/core/proto/ProtoParser.java#L92

Hi Jules,

Daniel and I have found in our analysis pipeline with exomiser that we are allowing v2 phenopackets through the command-line runner with permissive parsing. With the schema change from v1 "negated" is renamed to "excluded". Exomiser still runs but will allow all phenotypes through since the filtering step is looking to filter negated terms not excluded terms. This leads to incorrect analysis because negated/excluded terms are interpreted as present.

One possible solution is to upgrade the phenopacket parser to attempt to parse v2 phenopackets not permissively then on failure try to parse v1 phenopackets. Otherwise failing hard. Of course this is your call on how to move forward with parsing.

We might want to also consider reving to v2 phenopackets for 14.0.

Below is a test phenopacket when looking at the parsing.

{
  "id": "phenopacket-id",
  "subject": {
    "id": "subject-id"
  },
  "phenotypicFeatures": [{
    "type": {
      "id": "HP:0008066",
      "label": "Abnormal blistering of the skin"
    },
    "excluded": true
  }, {
    "type": {
      "id": "HP:0040189",
      "label": "Scaling skin"
    }
  }],
  "metaData": {
    "created": "2022-01-23T11:11:11.000000Z",
    "createdBy": "phenopacket-wizard",
    "resources": [{
      "id": "hp",
      "name": "human phenotype ontology",
      "url": "http://purl.obolibrary.org/obo/hp.owl",
      "version": "http://purl.obolibrary.org/obo/hp/releases/2022-06-11/hp.json",
      "namespacePrefix": "HP",
      "iriPrefix": "http://purl.obolibrary.org/obo/HP_"
    }],
    "phenopacketSchemaVersion": "2.0"
  }
}

iimpulse avatar Sep 21 '22 18:09 iimpulse

An example to demonstrate the mishap (can be used in ProtoParserTest):

This code parsers the above phenopacket:

@Test
void readV1PhenopacketJson() {
    // Path to the phenopacket from Mike's post
    Phenopacket phenopacket = parsePhenopacket(Paths.get("example.json"));

    assertThat(phenopacket.getPhenotypicFeaturesCount(), equalTo(2)); // passes

    long positivePhenotypicFeaturesCount = phenopacket.getPhenotypicFeaturesList().stream()
            .filter(pf -> !pf.getNegated())
            .count();
    // Fails because the `excluded` field from the v2 phenopacket is ignored, 
    // changing the `Abnormal blistering of the skin` to a present term despite 
    // that it should have been omitted from the analysis.
    assertThat(positivePhenotypicFeaturesCount, equalTo(1L));
}

ielis avatar Sep 21 '22 19:09 ielis

Thanks for the feedback Mike and Daniel. I'll add this to 14.0.0.

julesjacobsen avatar Sep 25 '22 08:09 julesjacobsen