Richard Jackson issues

Results 22 issues of


                                            Richard Jackson

OpenTargetsDisease parser does not included therapeutic areas that are diseases

Currently, we exclude all TAs: as these refer to areas that are not necessarily disease specific. We should make sure we don't filter disease specific TAs (e.g. https://platform.opentargets.org/disease/EFO_0005741)

added missing pyarrow dependency after change in how Chembl resource …

…is built

Add PFAM clans to ontology list

Quite often, our NER systems pick up conserved genomic regions as 'genes' - for example, "TIM barrel". We can improve precision by adding PFAM clans to our ontology list, and...

roman numeral normalisation doesn't handle X chromosome properly

X linked, or X chromosome -> 10 linked, 10 chromosome - quite important distinction!

check tests don’t clog up DB

We create lots of dummy parsers during unit tests. Let's check they don't clog up the cached DB/ are properly removed prior to release

Consider using a delta when returning search hits

currently, we return all hits from a search if they score above 0.0. This might be improved by incorporating some kind of delta (e.g. all hits within a certain score...

improve Xref mappings by using. embedded xref information from ontologies

Oxo only has ~500 mappings from HPO to MONDO, but mondo suggests ~5000 within the ontology. Can we write an Xref mapper implementation that uses metadata to create xrefs?

stanza 1.6.0 failing IT on linux

...although passing on Mac. ``` kazu/utils/stanza_pipeline.py:19: in simple_stanza_init stanza_pipeline = stanza.Pipeline( /tmp/kazu-env/lib/python3.9/site-packages/stanza/pipeline/core.py:235: in __init__ self.load_list = maintain_processor_list(resources, lang, package, processors, maybe_add_mwt=(not kwargs.get("tokenize_pretokenized"))) /tmp/kazu-env/lib/python3.9/site-packages/stanza/resources/common.py:208: in maintain_processor_list add_mwt(processors, resources, lang) _ _...

turn off hydra output file logging?

Original comment from @EFord36 some combo of: ```hydra.run.dir=. hydra.output_subdir=null hydra/job_logging=disabled hydra/hydra_logging=disabled```

Transformer matches can be too conservative

Original comment from @EFord36 [kazu/steps/ner/opsin.py extendString() method reworks entity matches to account for Transformer model matches that tend to identify only a part of entities with longer names - which...