scispacy icon indicating copy to clipboard operation
scispacy copied to clipboard

Clean up vocab creation

Open dakinggg opened this issue 5 years ago • 0 comments

This script is getting quite a few steps removed from the original corpus now. It might be better to convert this to a script which reads a large corpus and creates the vocabularies directly, rather than us having created this intermediate file with the word/doc counts in it, and then having this one generate a vocabulary file which is not substantially different apart from how it is filtered.

Originally posted by @DeNeutoy in https://github.com/allenai/scispacy/pull/295#discussion_r558664660

dakinggg avatar Jan 27 '21 23:01 dakinggg