Richard Jackson
Richard Jackson
Currently, we exclude all TAs: as these refer to areas that are not necessarily disease specific. We should make sure we don't filter disease specific TAs (e.g. https://platform.opentargets.org/disease/EFO_0005741)
Quite often, our NER systems pick up conserved genomic regions as 'genes' - for example, "TIM barrel". We can improve precision by adding PFAM clans to our ontology list, and...
X linked, or X chromosome -> 10 linked, 10 chromosome - quite important distinction!
We create lots of dummy parsers during unit tests. Let's check they don't clog up the cached DB/ are properly removed prior to release
currently, we return all hits from a search if they score above 0.0. This might be improved by incorporating some kind of delta (e.g. all hits within a certain score...
Oxo only has ~500 mappings from HPO to MONDO, but mondo suggests ~5000 within the ontology. Can we write an Xref mapper implementation that uses metadata to create xrefs?
...although passing on Mac. ``` kazu/utils/stanza_pipeline.py:19: in simple_stanza_init stanza_pipeline = stanza.Pipeline( /tmp/kazu-env/lib/python3.9/site-packages/stanza/pipeline/core.py:235: in __init__ self.load_list = maintain_processor_list(resources, lang, package, processors, maybe_add_mwt=(not kwargs.get("tokenize_pretokenized"))) /tmp/kazu-env/lib/python3.9/site-packages/stanza/resources/common.py:208: in maintain_processor_list add_mwt(processors, resources, lang) _ _...
Original comment from @EFord36 some combo of: ```hydra.run.dir=. hydra.output_subdir=null hydra/job_logging=disabled hydra/hydra_logging=disabled```
Original comment from @EFord36 [kazu/steps/ner/opsin.py extendString() method reworks entity matches to account for Transformer model matches that tend to identify only a part of entities with longer names - which...