One process does all jobs by itself
First off, I love the ease of use of this project!
I am trying to multiprocess the reading of an image (using rasterio. Each row in my dataframe will read a window from the image source. The image path is distributed to each process, which will then open it.
When I run parallel_apply(), all four processes seem to start, but only the first one continues. The three others stops at job 1, and the first one performs all jobs (152/38). The result is a dataframe concatenated from all processes, having four times as many rows as the input, with NaN-values for 3/4 of the rows. See screen cap below.
Do you have any input on why this is happening?

Hello,
I'm not sure where does your issue comes from.
- Could you please try with
pandarallel v1.4.0? - Could you also try the "classic"
pandasway (if not already done) to be sure this issue comes frompandarallel? - If not solved by 1., could you please send me the code you used to get this error?
My use case is kind of specific. Each thread should open a separate dataset reader and pass that to their respective jobs. (i.e. 4 dataset readers, one per thread)
I think with pandarellel I only have the option for each job to open a dataset reader. This will be very IO demanding and slow. (Or I could send the same reader to all threads, but that is not allowed). So pandarallel is sadly not a good fit for my case.
So I ended up implementing a method for this myself, based on this gist. This way I can run an intermediate (partially evaluated) function that opens a dataset reader for each thread/process.
So I have not made it work with pandarellel, but I guess it has to do with the threads or jobs blocking each others dataset readers.
Edit: Here is a gist describing my solution. Maybe it can be further generalised to support other cases when there is a need for running an intermediate function per thread/process. It is not as user-friendly and beautifully implemented as pandarallel, so let me know if something is unclear!