Include PyOBO products
See https://github.com/biopragmatics/obo-db-ingest
It would be quite easy to add these as builds, and distribute the sqlite on s3.
Advantages:
- easy to query in OAK (though some methods and commands e.g
treewouldn't make sense as these wouldn't follow expected structural shapes for ontologies) - fast to query via SQL
Note there is ongoing discussion about URIs for these, but semantic-sql doesn't care, we store things natively as CURIEs, and the prefix table can be swapped to anything.
Ideally the products would be built and distributed (obo/owl/json) upstream, to avoid running the build step, as this introduces an additional source of potential pipeline failure, we also have to determine memory/disk requirements
cc @cthoyt
Yes they’re all built and distributed on GitHub at the moment but some need to be gzipped. Is that alright?
gzip is fine. It would be great if all had stable URLs, to avoid modifying the registry entry on new releases (it is worth continuing to explore housing some of these on OBO but that can be pursued separately). Standardizing on ISO-8601 for release dates would be great too.
I'm trying a few of these. I am manually adding to the registry for now but perhaps we could come up with some kind of standard registry yaml for this sort of thing.
FYI there are now PURLs for these files, standardized to ISO standard for dates when possible. Examples:
| Resource | Version Type | Example PURL |
|---|---|---|
| Reactome | Sequential | https://w3id.org/biopragmatics/resources/reactome/83/reactome.obo |
| Interpro | Major/Minor | https://w3id.org/biopragmatics/resources/interpro/92.0/interpro.obo |
| Interpro | Semantic | https://w3id.org/biopragmatics/resources/drugbank.salt/5.1.9/drugbank.salt.obo |
| MeSH | Year | https://w3id.org/biopragmatics/resources/mesh/2003/mesh.obo.gz |
| UniProt | Year/Month | https://w3id.org/biopragmatics/resources/uniprot/2022_05/uniprot.obo.gz |
| HGNC | Date | https://w3id.org/biopragmatics/resources/hgnc/2023-02-01/hgnc.obo |
| CGNC | unversioned* | https://w3id.org/biopragmatics/resources/cgnc/cgnc.obo |
to do:
- Make a "release" even for versioned ones so there's a stable URL pointing to the most recent version
- Standardize date formats further, e.g. for UniProt, Wikipathways, etc
- Create some kind of manifest file of the latest build
Awesome!!!
On Thu, Mar 16, 2023 at 4:33 PM Charles Tapley Hoyt < @.***> wrote:
FYI there are now PURLs for these files, standardized to ISO standard for dates when possible. Examples: Resource Version Type Example PURL Reactome Sequential https://w3id.org/biopragmatics/resources/reactome/83/reactome.obo Interpro Major/Minor https://w3id.org/biopragmatics/resources/interpro/92.0/interpro.obo Interpro Semantic https://w3id.org/biopragmatics/resources/drugbank.salt/5.1.9/drugbank.salt.obo MeSH Year https://w3id.org/biopragmatics/resources/mesh/2003/mesh.obo.gz UniProt Year/Month https://w3id.org/biopragmatics/resources/uniprot/2022_05/uniprot.obo.gz HGNC Date https://w3id.org/biopragmatics/resources/hgnc/2023-02-01/hgnc.obo CGNC unversioned* https://w3id.org/biopragmatics/resources/cgnc/cgnc.obo
to do:
- Make a "release" even for versioned ones so there's a stable URL pointing to the most recent version
- Standardize date formats further, e.g. for UniProt, Wikipathways, etc
— Reply to this email directly, view it on GitHub https://github.com/INCATools/semantic-sql/issues/45#issuecomment-1472895731, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONB2CCIWIQNC7OL73LW4OPL5ANCNFSM53SXONVA . You are receiving this because you authored the thread.Message ID: @.***>
and remember, what you have is swissprot, NOT uniprot! :-)
@cmungall here's the manifest file, with PURLs for each of the most recent artifacts listed in it: https://github.com/biopragmatics/obo-db-ingest/blob/main/manifest.yml