tika-python
tika-python copied to clipboard
Race when running multiple instances of `tika.parser.from_file()`?
Hi!
Is there a race when running multiple tika.parser.from_file() in parallel using Python multiprocessing? It seems to me that if I run from_file it will first download the jar file and then start the java subprocess. If something else runs from_file after the first process starts downloading the file but before the port comes up weird things may happen. Such as double download of the tika-server.jar or double subprocess startup. Is this analysis right?
Although I'm reading https://github.com/chrismattmann/tika-python/issues/337 and there it looks like it will work.