batchtools icon indicating copy to clipboard operation
batchtools copied to clipboard

reduction of results in parallel

Open berndbischl opened this issue 9 years ago • 6 comments

in BE for good reason we had a parallel reduction method. as for regs with a larger number of results / a non trivial reduction which at least does a little bit of "computation" or transformation you dont want to wait for hours (if you are on a parallel system)

is this supported now? because i dont think so

berndbischl avatar Dec 14 '16 12:12 berndbischl

There is batchMapResults. What function are you missing?

mllg avatar Dec 14 '16 22:12 mllg

Can you write down an example here, how you would do this, pls.

berndbischl avatar Dec 14 '16 23:12 berndbischl

(Yes, it's not that I cannot, I just want both of us to see wether that gets complicated and lengthy)

berndbischl avatar Dec 14 '16 23:12 berndbischl

To directly answer: I am missing something similar as this https://github.com/tudo-r/BatchExperiments/blob/master/R/reduceResultsExperimentsParallel.R

We can talk about different ways to do this, but the motivation for that function is exactly the same as for my issue here.

berndbischl avatar Dec 14 '16 23:12 berndbischl

Reducing 35k jobs takes around 40 minutes, thats pretty long...

ja-thomas avatar Dec 22 '16 15:12 ja-thomas

Some comments for registries with many jobs:

  • The file system or network device is usually the bottle neck. Parallelism helps, but don't expect miracles from something like reduceResultsParallel, this does not scale well.
  • It is usually faster to move the file directory to a local ssd first (i.e., do a sequential read on the slow file system), then reduce the results
  • A way to store the results of a complete chunk in a single file would be best. But this requires quite some work.

mllg avatar Dec 22 '16 22:12 mllg