Demonstrate using TAXRANK for ranks and properties
Closes #96, closes #97
This PR does the following (updated Feb 19, 2025):
- Adds triples connecting NCBITaxon terms to TAXRANK terms using the TAXRANK:100000 (has rank) relation. This duplicates, in parallel, the existing ad-hoc
ncbitaxon:has_rankrelations that point to ad-hoc rank terms in theNCBITaxon:namespace - Explicitly handles the
no rankrank - Adds deprecation notice to the predicate
ncbitaxon:has_rank, removes logical axioms, updates its name, and adds a replaced-by notice pointing to TAXRANK:100000 - Adds a deprecation notice to all ad-hoc NCBITaxon ranks with appropriate replaced-by notices pointing to TAXRANK terms
It depends on https://github.com/phenoscape/taxrank/pull/5, where I added the remaining 12 ranks that weren't already represented. That PR was merged and released on November 28th, 2024. Related: that PR included adding a tax rank for "strain", which was the point of discussion also in #107
@matentzn, are there any updates on "community feedback" for this change? Is this issue the best place to provide feedback?
I have been working to create ontology descriptions of other taxonomy sources (gtdb and silva) where I include TAXRANK per @cthoyt suggestion. Using TAXRANK to define taxonomic ranks will vastly facilitate comparison and mapping between different taxonomies.
I’m I favor of this change
On Sun, Feb 16, 2025 at 7:56 PM jplfaria @.***> wrote:
@matentzn https://github.com/matentzn, are there any updates on "community feedback" for this change? Is this issue the best place to provide feedback?
I have been working to create ontology descriptions of other taxonomy sources (gtdb and silva) where I include TAXRANK per @cthoyt https://github.com/cthoyt suggestion. Using TAXRANK to define taxonomic ranks will vastly facilitate comparison and mapping between different taxonomies.
— Reply to this email directly, view it on GitHub https://github.com/obophenotype/ncbitaxon/pull/120#issuecomment-2661910664, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOPXHBWA4EYBCPP55G32QFMWDAVCNFSM6AAAAABSTUPRXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRRHEYTANRWGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: jplfaria]jplfaria left a comment (obophenotype/ncbitaxon#120) https://github.com/obophenotype/ncbitaxon/pull/120#issuecomment-2661910664
@matentzn https://github.com/matentzn, are there any updates on "community feedback" for this change? Is this issue the best place to provide feedback?
I have been working to create ontology descriptions of other taxonomy sources (gtdb and silva) where I include TAXRANK per @cthoyt https://github.com/cthoyt suggestion. Using TAXRANK to define taxonomic ranks will vastly facilitate comparison and mapping between different taxonomies.
— Reply to this email directly, view it on GitHub https://github.com/obophenotype/ncbitaxon/pull/120#issuecomment-2661910664, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOPXHBWA4EYBCPP55G32QFMWDAVCNFSM6AAAAABSTUPRXWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNRRHEYTANRWGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
We discussed this in the OBO Community Slack on November 27, but unfortunately the discussion did not get back here.
The current version of this PR drops the ncbitaxon:has_rank predicate and classes that we've been using for 20+ years, and so breaks any system that queries for or displays the old ranks.
If the PR just added new TAXRANK stuff without dropping the old stuff, it wouldn't be a breaking change.
I think having an interim period with both would be fine, but can we start a survey of people who are using the old properties with a view to declaring EOL?
We use ncbitaxon heavily but this change would not break things for us AFAICT, but others use cases may differ
It's not exactly clear what are the accept conditions for this PR, but I made the following updates:
- Reverted deletions of old code that created old-style ad-hoc identifiers
- Added deprecations and replaced-by relations to old-style ad-hoc identifiers
- Add all usages of the TAXRANK relation and TAXRANK terms in parallel.
Therefore, you only have to judge if you think it's alright to duplicate what's here already (in two different flavors)
Otherwise, please give explicit instructions on what to change to get this accepted
1a48d43 addresses my concerns about breakage.
In 27ecf11 I added "obsolete" to labels and removed subClassOf assertions, following OBO obsoletion best practices, drafted as principle 19 here: https://github.com/OBOFoundry/OBOFoundry.github.io/blob/nataled-patch-50/principles/fp-019-term-stability.md#implementation
We could also add an IAO:0000231 'obsolescence reason', but the IAO:0100001 'term replaced by' is doing the real work.
We don't often deprecate annotation properties, and that draft principle doesn't mention this case. It's reasonable to drop the old annotations after some deprecation period an giving ample notice. https://obofoundry.org/principles/fp-013-notification.html has some discussion, but the requirements there are minimal. NCBI Taxonomy is one of the oldest and most-used OBO projects, so I think extra care is justified.
It had many changes after @matentzn had approved. It might need to be checked again.
the fixes were only minor formatting of turtle (spaces)
but more generally, I have had issues with making the TTL convert to OWL. It seems like it's finally too big. I tried adding ROBOT_JAVA_ARGS=-Xmx32G to the makefile, but that didn't solve the issue
ROBOT_JAVA_ARGS=-Xmx32G make
$ robot convert -i ncbitaxon.ttl -o ncbitaxon.json
WARNING: A terminally deprecated method in sun.misc.Unsafe has been called
WARNING: sun.misc.Unsafe::objectFieldOffset has been called by com.github.benmanes.caffeine.base.UnsafeAccess (file:/Users/cthoyt/.local/bin/robot.jar)
WARNING: Please consider reporting this to the maintainers of class com.github.benmanes.caffeine.base.UnsafeAccess
WARNING: sun.misc.Unsafe::objectFieldOffset will be removed in a future release
wget http://purl.obolibrary.org/obo/ncbitaxon.obo -O current_ncbitaxon.obo
--2025-09-12 14:01:23-- http://purl.obolibrary.org/obo/ncbitaxon.obo
Resolving purl.obolibrary.org (purl.obolibrary.org)... 2606:4700:4409::6812:253b, 2a06:98c1:3107::ac40:96c5, 104.18.37.59, ...
Connecting to purl.obolibrary.org (purl.obolibrary.org)|2606:4700:4409::6812:253b|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/obophenotype/ncbitaxon/releases/latest/download/ncbitaxon.obo [following]
--2025-09-12 14:01:23-- https://github.com/obophenotype/ncbitaxon/releases/latest/download/ncbitaxon.obo
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/obophenotype/ncbitaxon/releases/download/v2025-03-13/ncbitaxon.obo [following]
--2025-09-12 14:01:23-- https://github.com/obophenotype/ncbitaxon/releases/download/v2025-03-13/ncbitaxon.obo
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://release-assets.githubusercontent.com/github-production-release-asset/43841146/d8661be8-5382-47e7-8da3-1f2382a32f8b?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-09-12T12%3A45%3A02Z&rscd=attachment%3B+filename%3Dncbitaxon.obo&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2025-09-12T11%3A44%3A05Z&ske=2025-09-12T12%3A45%3A02Z&sks=b&skv=2018-11-09&sig=wcdS%2F9nhM%2BMK7%2F7NwEo5ee1HQYuGS%2F01MulpNuC1lL0%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc1NzY3ODc4NCwibmJmIjoxNzU3Njc4NDg0LCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5ibG9iLmNvcmUud2luZG93cy5uZXQifQ.l6xCtoAesREi3anX1kBQ3VUvr1YcW66JWWGifBufPnU&response-content-disposition=attachment%3B%20filename%3Dncbitaxon.obo&response-content-type=application%2Foctet-stream [following]
--2025-09-12 14:01:24-- https://release-assets.githubusercontent.com/github-production-release-asset/43841146/d8661be8-5382-47e7-8da3-1f2382a32f8b?sp=r&sv=2018-11-09&sr=b&spr=https&se=2025-09-12T12%3A45%3A02Z&rscd=attachment%3B+filename%3Dncbitaxon.obo&rsct=application%2Foctet-stream&skoid=96c2d410-5711-43a1-aedd-ab1947aa7ab0&sktid=398a6654-997b-47e9-b12b-9515b896b4de&skt=2025-09-12T11%3A44%3A05Z&ske=2025-09-12T12%3A45%3A02Z&sks=b&skv=2018-11-09&sig=wcdS%2F9nhM%2BMK7%2F7NwEo5ee1HQYuGS%2F01MulpNuC1lL0%3D&jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmVsZWFzZS1hc3NldHMuZ2l0aHVidXNlcmNvbnRlbnQuY29tIiwia2V5Ijoia2V5MSIsImV4cCI6MTc1NzY3ODc4NCwibmJmIjoxNzU3Njc4NDg0LCJwYXRoIjoicmVsZWFzZWFzc2V0cHJvZHVjdGlvbi5ibG9iLmNvcmUud2luZG93cy5uZXQifQ.l6xCtoAesREi3anX1kBQ3VUvr1YcW66JWWGifBufPnU&response-content-disposition=attachment%3B%20filename%3Dncbitaxon.obo&response-content-type=application%2Foctet-stream
Resolving release-assets.githubusercontent.com (release-assets.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...
Connecting to release-assets.githubusercontent.com (release-assets.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 524761049 (500M) [application/octet-stream]
Saving to: ‘current_ncbitaxon.obo’
current_ncbitaxon.obo 100%[==========================================================================================>] 500.45M 12.3MB/s in 42s
2025-09-12 14:02:06 (11.8 MB/s) - ‘current_ncbitaxon.obo’ saved [524761049/524761049]
robot diff --left current_ncbitaxon.obo --right ncbitaxon.obo -o ncbi_diff_latest_current_obo.txt
WARNING: A terminally deprecated method in sun.misc.Unsafe has been called
WARNING: sun.misc.Unsafe::objectFieldOffset has been called by com.github.benmanes.caffeine.base.UnsafeAccess (file:/Users/cthoyt/.local/bin/robot.jar)
WARNING: Please consider reporting this to the maintainers of class com.github.benmanes.caffeine.base.UnsafeAccess
WARNING: sun.misc.Unsafe::objectFieldOffset will be removed in a future release
Exception in thread "main" java.lang.OutOfMemoryError: Required array length 2147483639 + 43 is too large
at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:914)
at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:907)
at java.base/java.lang.AbstractStringBuilder.newCapacity(AbstractStringBuilder.java:267)
at java.base/java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:247)
at java.base/java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:591)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:179)
at java.base/java.lang.StringBuilder.append(StringBuilder.java:173)
at scala.collection.IterableOnceOps.addString(IterableOnce.scala:1187)
at scala.collection.IterableOnceOps.addString$(IterableOnce.scala:1179)
at scala.collection.AbstractIterable.addString(Iterable.scala:919)
at scala.collection.IterableOnceOps.mkString(IterableOnce.scala:1129)
at scala.collection.IterableOnceOps.mkString$(IterableOnce.scala:1127)
at scala.collection.AbstractIterable.mkString(Iterable.scala:919)
at scala.collection.IterableOnceOps.mkString(IterableOnce.scala:1142)
at scala.collection.IterableOnceOps.mkString$(IterableOnce.scala:1142)
at scala.collection.AbstractIterable.mkString(Iterable.scala:919)
at org.geneontology.owl.differ.render.BasicDiffRenderer$.format(BasicDiffRenderer.scala:47)
at org.geneontology.owl.differ.render.BasicDiffRenderer$.renderPlain(BasicDiffRenderer.scala:17)
at org.geneontology.owl.differ.render.BasicDiffRenderer.renderPlain(BasicDiffRenderer.scala)
at org.obolibrary.robot.DiffOperation.compare(DiffOperation.java:126)
at org.obolibrary.robot.DiffCommand.execute(DiffCommand.java:163)
at org.obolibrary.robot.CommandManager.executeCommand(CommandManager.java:244)
at org.obolibrary.robot.CommandManager.execute(CommandManager.java:188)
at org.obolibrary.robot.CommandManager.main(CommandManager.java:135)
at org.obolibrary.robot.CommandLineInterface.main(CommandLineInterface.java:65)
make: *** [ncbi_diff_latest_current_obo.txt] Error 1
but more generally, I have had issues with making the TTL convert to OWL. It seems like it's finally too big. I tried adding
ROBOT_JAVA_ARGS=-Xmx32Gto the makefile, but that didn't solve the issue
@cthoyt what version of ROBOT are you using? There used to be a problem with diff which required allocating a huge string, but I fixed that here https://github.com/ontodev/robot/pull/1227
I was on 1.9.7, but I updated to 1.9.8 and this was fixed. Thanks @balhoff !
I also rebased/squashed my commits to simplify the history.
here's a screenshot of protege showing the rank properties coexisting
@matentzn it looks like chris and james both agree on this, and I have tested it locally with the help of Jim. We have community consensus, is there any other blocker for merging this?
Thanks @cthoyt I will leave it to @anitacaron to merge this as the guardian of NCBITaxon OBO Edition; The main issue was
Add all usages of the TAXRANK relation and TAXRANK terms in parallel.
The rest is fine by me.
@matentzn I think both Chris (comment here https://github.com/obophenotype/ncbitaxon/pull/120#issuecomment-2666312168) and James (comment here https://github.com/obophenotype/ncbitaxon/pull/120#issuecomment-2674834029) agreed that having them in parallel is a good thing that will support the community to transition to the new ones.
@matentzn I think both Chris (comment here https://github.com/obophenotype/ncbitaxon/pull/120#issuecomment-2666312168) and James (comment here https://github.com/obophenotype/ncbitaxon/pull/120#issuecomment-2674834029) agreed that having them in parallel is a good thing that will support the community to transition to the new ones.
Sorry I was grammatically off. What I meant to say (thats why I approved ;) THAT item was the only thing I cared about significantly here! So, I am in full agreement of this PR.
I've checked the code, and ideally, the axioms for TAXRANK should be sourced from http://purl.obolibrary.org/obo/taxrank.owl.
It seems there's still missing some ranks:
Summary of unrecognized ranks:
Counter({'realm': 7, 'domain': 3, 'acellular root': 1, 'cellular root': 1, 'subvariety': 1})
@anitacaron it appears this has popped up since NCBITaxon has updated in the very long time of review. I do not want to make the diff more complicated for this PR incorporating unrelated changes.
Addressing the unrecognized ranks requires coordinating with TAXRANK ontology, and is not related for this PR. I promise I will take care of it, but I would rather not have a moving target on this once since it has been very difficult to get additional rounds of feedback.
Extra context: the build workflow has been resilient to missing information before, and still is.
FYI I started to update TAXRANK in https://github.com/phenoscape/taxrank/pull/10, but again, this is an independent change from what's here, and this PR is not blocked by updating for new NCBI tax ranks
'acellular root': 1, 'cellular root': 1
These are not ranks
I defer to @balhoff on inclusion of weird NCBI overloading as 'ranks'
A grudging compromise could be
- [] TAXRANK:0000000 ! taxonomic_rank
- [i] TAXRANK:0000001 ! phylum
- [i] TAXRANK:0000002 ! class
- [i] TAXRANK:0000003 ! order
- [i] TAXRANK:0000004 ! family
- [] **TAXRANK:9000000 ! pseudorank DEF: a grab bag of terms used by NCBI to tag nodes in the taxonomy
- [i] **TAXRANK:9000001 ! acellular node
- ...
@matentzn @anitacaron after https://github.com/phenoscape/taxrank/pull/10 and https://github.com/phenoscape/taxrank/pull/11, we have addressed your comments after getting consensus from Chris and Jim how to model these and updated in this PR accordingly in 97c060f
again, I think I have addressed all of your reservations. is there anything else you need from me before merging this?
#136 was just merged and I've taken care of rebasing this PR on top of that (no changes were necessary since this PR also was already incorporating the new taxranks)
thanks @anitacaron