PolyglotDB
PolyglotDB copied to clipboard
Encoding baseline duration exceeds memory limit
I've been trying to use the encode_baseline measure for words inside of a SPADE script, currently:
with CorpusContext(config) as c:
if not c.hierarchy.has_token_property('word', 'baseline'):
print('getting baseline word duration')
c.encode_baseline('word', 'duration')
This works fine on smaller corpora (like ICE-Can or Modern RP), but exceeds the memory limit (even on Roquefort) for corpora of SOTC-size and larger.
@mmcauliffe any thoughts on this? I know you probably won't have time to fix before leaving, but any guidance appreciated. like, do you suspect the issue will have been resolved with your recent memory optimizations -- or does the issue seem like an actual bug?