Liz G issues

Results 21 issues of


                                            Liz G

Make models errors

When we download the models we look for all the artefact_targets, but these are out of date to the format the multitask models are saved in so we get the...

Makefile changes (tox and dist)

After debugging an issue with keras with @jdu, Jeff suggested we need to redesign some parts of how the tests are run. - We shouldnt be using a virtualenv with...

Readme F1 tables don't match results sheets

Make sure the results are correctly copied for `2020.3.6_splitting`, `2020.3.8_parsing` and `2020.4.5_multitask` from https://docs.google.com/spreadsheets/d/1gu6jJ83Ad15VztmB2aCGTgVsQgaudr6rwbU6RO0YV-I/edit#gid=1445206406

Consider spans in output

In the output of `split_parser`, `split` and `parser` we have an output of tokens and predictions. It may be worth considering a different type of output with the spans of...

Increase maximum length in nlp

When using the deep reference parser in Reach, we got the error: [note this is from the deep_reference_parser-2019.12.1-py3-none-any.whl version, but I think this issue still stands in the current DRP...

Review splitting references logic

The decision to split up references at a `b-r` tag and end at the next `e-r` tag should be reviewed and different approaches should be considered and tested. This is...

Investigate paralysation of yielding structured references

In `split_reach/extracter/extract_refs_task.py` we set `pool_map = map` for use in `yield_structured_references`. However if we utilise Pool from `multiprocessing` i.e. ``` pool = Pool(num_workers) pool_map = pool.map ``` we could speed...

Data science investigation

Include integration tests

It would be nice to have some tests of refparse where we could run it locally with smaller data and have a known output. This comes up because when I...

Data science investigation

Are some of the sections not extracted fully?

PR https://github.com/wellcometrust/reach/pull/319 fixes an issue with duplication, but in doing this I noticed a possible problem with the logic to the line(s) of code in `grab_section` in `pdf_parse.py`: ``` result...

Completeness

Engineering

Data science development

Fix PDF extraction for OS X(in pdf_parser.py)

After some tests failing when I ran `make test` on my mac computer @ivyleavedtoadflax realised there is an error in `reach/pdf_parser/pdf_parser.py` in which the text extraction works as desired for...

Bug

Completeness

Engineering