tika-python icon indicating copy to clipboard operation
tika-python copied to clipboard

Race when running multiple instances of `tika.parser.from_file()`?

Open ember91 opened this issue 9 months ago • 0 comments

Hi!

Is there a race when running multiple tika.parser.from_file() in parallel using Python multiprocessing? It seems to me that if I run from_file it will first download the jar file and then start the java subprocess. If something else runs from_file after the first process starts downloading the file but before the port comes up weird things may happen. Such as double download of the tika-server.jar or double subprocess startup. Is this analysis right?

Although I'm reading https://github.com/chrismattmann/tika-python/issues/337 and there it looks like it will work.

ember91 avatar May 27 '25 13:05 ember91