Liz G
Liz G
When we download the models we look for all the artefact_targets, but these are out of date to the format the multitask models are saved in so we get the...
After debugging an issue with keras with @jdu, Jeff suggested we need to redesign some parts of how the tests are run. - We shouldnt be using a virtualenv with...
Make sure the results are correctly copied for `2020.3.6_splitting`, `2020.3.8_parsing` and `2020.4.5_multitask` from https://docs.google.com/spreadsheets/d/1gu6jJ83Ad15VztmB2aCGTgVsQgaudr6rwbU6RO0YV-I/edit#gid=1445206406
In the output of `split_parser`, `split` and `parser` we have an output of tokens and predictions. It may be worth considering a different type of output with the spans of...
When using the deep reference parser in Reach, we got the error: [note this is from the deep_reference_parser-2019.12.1-py3-none-any.whl version, but I think this issue still stands in the current DRP...
The decision to split up references at a `b-r` tag and end at the next `e-r` tag should be reviewed and different approaches should be considered and tested. This is...
In `split_reach/extracter/extract_refs_task.py` we set `pool_map = map` for use in `yield_structured_references`. However if we utilise Pool from `multiprocessing` i.e. ``` pool = Pool(num_workers) pool_map = pool.map ``` we could speed...
It would be nice to have some tests of refparse where we could run it locally with smaller data and have a known output. This comes up because when I...
PR https://github.com/wellcometrust/reach/pull/319 fixes an issue with duplication, but in doing this I noticed a possible problem with the logic to the line(s) of code in `grab_section` in `pdf_parse.py`: ``` result...
After some tests failing when I ran `make test` on my mac computer @ivyleavedtoadflax realised there is an error in `reach/pdf_parser/pdf_parser.py` in which the text extraction works as desired for...