reduction of results in parallel
in BE for good reason we had a parallel reduction method. as for regs with a larger number of results / a non trivial reduction which at least does a little bit of "computation" or transformation you dont want to wait for hours (if you are on a parallel system)
is this supported now? because i dont think so
There is batchMapResults. What function are you missing?
Can you write down an example here, how you would do this, pls.
(Yes, it's not that I cannot, I just want both of us to see wether that gets complicated and lengthy)
To directly answer: I am missing something similar as this https://github.com/tudo-r/BatchExperiments/blob/master/R/reduceResultsExperimentsParallel.R
We can talk about different ways to do this, but the motivation for that function is exactly the same as for my issue here.
Reducing 35k jobs takes around 40 minutes, thats pretty long...
Some comments for registries with many jobs:
- The file system or network device is usually the bottle neck. Parallelism helps, but don't expect miracles from something like
reduceResultsParallel, this does not scale well. - It is usually faster to move the file directory to a local ssd first (i.e., do a sequential read on the slow file system), then reduce the results
- A way to store the results of a complete chunk in a single file would be best. But this requires quite some work.