semantic-sql icon indicating copy to clipboard operation
semantic-sql copied to clipboard

Include PyOBO products

Open cmungall opened this issue 3 years ago • 6 comments

See https://github.com/biopragmatics/obo-db-ingest

It would be quite easy to add these as builds, and distribute the sqlite on s3.

Advantages:

  • easy to query in OAK (though some methods and commands e.g tree wouldn't make sense as these wouldn't follow expected structural shapes for ontologies)
  • fast to query via SQL

Note there is ongoing discussion about URIs for these, but semantic-sql doesn't care, we store things natively as CURIEs, and the prefix table can be swapped to anything.

Ideally the products would be built and distributed (obo/owl/json) upstream, to avoid running the build step, as this introduces an additional source of potential pipeline failure, we also have to determine memory/disk requirements

cc @cthoyt

cmungall avatar Jul 14 '22 15:07 cmungall

Yes they’re all built and distributed on GitHub at the moment but some need to be gzipped. Is that alright?

cthoyt avatar Jul 18 '22 13:07 cthoyt

gzip is fine. It would be great if all had stable URLs, to avoid modifying the registry entry on new releases (it is worth continuing to explore housing some of these on OBO but that can be pursued separately). Standardizing on ISO-8601 for release dates would be great too.

I'm trying a few of these. I am manually adding to the registry for now but perhaps we could come up with some kind of standard registry yaml for this sort of thing.

cmungall avatar Nov 23 '22 01:11 cmungall

FYI there are now PURLs for these files, standardized to ISO standard for dates when possible. Examples:

Resource Version Type Example PURL
Reactome Sequential https://w3id.org/biopragmatics/resources/reactome/83/reactome.obo
Interpro Major/Minor https://w3id.org/biopragmatics/resources/interpro/92.0/interpro.obo
Interpro Semantic https://w3id.org/biopragmatics/resources/drugbank.salt/5.1.9/drugbank.salt.obo
MeSH Year https://w3id.org/biopragmatics/resources/mesh/2003/mesh.obo.gz
UniProt Year/Month https://w3id.org/biopragmatics/resources/uniprot/2022_05/uniprot.obo.gz
HGNC Date https://w3id.org/biopragmatics/resources/hgnc/2023-02-01/hgnc.obo
CGNC unversioned* https://w3id.org/biopragmatics/resources/cgnc/cgnc.obo

to do:

  1. Make a "release" even for versioned ones so there's a stable URL pointing to the most recent version
  2. Standardize date formats further, e.g. for UniProt, Wikipathways, etc
  3. Create some kind of manifest file of the latest build

cthoyt avatar Mar 16 '23 23:03 cthoyt

Awesome!!!

On Thu, Mar 16, 2023 at 4:33 PM Charles Tapley Hoyt < @.***> wrote:

FYI there are now PURLs for these files, standardized to ISO standard for dates when possible. Examples: Resource Version Type Example PURL Reactome Sequential https://w3id.org/biopragmatics/resources/reactome/83/reactome.obo Interpro Major/Minor https://w3id.org/biopragmatics/resources/interpro/92.0/interpro.obo Interpro Semantic https://w3id.org/biopragmatics/resources/drugbank.salt/5.1.9/drugbank.salt.obo MeSH Year https://w3id.org/biopragmatics/resources/mesh/2003/mesh.obo.gz UniProt Year/Month https://w3id.org/biopragmatics/resources/uniprot/2022_05/uniprot.obo.gz HGNC Date https://w3id.org/biopragmatics/resources/hgnc/2023-02-01/hgnc.obo CGNC unversioned* https://w3id.org/biopragmatics/resources/cgnc/cgnc.obo

to do:

  1. Make a "release" even for versioned ones so there's a stable URL pointing to the most recent version
  2. Standardize date formats further, e.g. for UniProt, Wikipathways, etc

— Reply to this email directly, view it on GitHub https://github.com/INCATools/semantic-sql/issues/45#issuecomment-1472895731, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONB2CCIWIQNC7OL73LW4OPL5ANCNFSM53SXONVA . You are receiving this because you authored the thread.Message ID: @.***>

cmungall avatar Mar 17 '23 14:03 cmungall

and remember, what you have is swissprot, NOT uniprot! :-)

cmungall avatar Mar 17 '23 15:03 cmungall

@cmungall here's the manifest file, with PURLs for each of the most recent artifacts listed in it: https://github.com/biopragmatics/obo-db-ingest/blob/main/manifest.yml

cthoyt avatar Mar 18 '23 00:03 cthoyt