BiocParallel icon indicating copy to clipboard operation
BiocParallel copied to clipboard

usage of sockets by MulticoreParam

Open lawremi opened this issue 7 years ago • 11 comments

It appears that on our cluster the port usage is too restrictive for the socket-based multicore backend. I think the base multicore uses raw connections or some sort of shared memory to communicate within the R process and its children, but I could be wrong. Is there any way to get back to a pure multicore implementation of BiocParallel? Why is the current one using sockets?

lawremi avatar Oct 24 '18 21:10 lawremi

If of any help, for a quick workaround, you can use a 'multicore' future backend, which uses parallel::mcparallel()/mccollect()-style forked processing without sockets. You can achieve this using:

library(BiocParallel)
register(DoparParam())
library(doFuture)
registerDoFuture()
plan(multicore)  ## this is where you control the actual backend

mu <- 1.0
sigma <- 2.0
x <- bplapply(1:3, mu = mu, sigma = sigma, function(i, mu, sigma) {
  rnorm(i, mean = mu, sd = sigma)
})

Or, you can use the more direct BiocParallel.FutureParam (only on GitHub):

## remotes::install_github("HenrikBengtsson/BiocParallel.FutureParam")
library("BiocParallel.FutureParam")
register(FutureParam())
plan(multicore)  ## this is where you control the actual backend

mu <- 1.0
sigma <- 2.0
x <- bplapply(1:3, mu = mu, sigma = sigma, function(i, mu, sigma) {
  rnorm(i, mean = mu, sd = sigma)
})

HenrikBengtsson avatar Oct 24 '18 23:10 HenrikBengtsson

I think base R creates a 'write once' pipe shared with the forked process and written to by the forked process before it exits. We'd like bi-directional communication persisting across multiple exchanges between the forked process and manager. The socket solution was adopted because it could be implemented in R, shared across several back-ends, and re-use existing code. But I agree that sockets cause problems, and I'd be up for exploring either pipe-based or shared memory solutions...this would take me a little time to work through.

mtmorgan avatar Oct 25 '18 02:10 mtmorgan

I guess BiocParallel is a lot more complicated than it used to be. It's too bad there isn't some minimal abstraction that could be backed by mcmapply().

lawremi avatar Oct 25 '18 03:10 lawremi

it's always been implemented to support persistent workers on all back-ends.

mtmorgan avatar Oct 25 '18 10:10 mtmorgan

We also are having problems with MultiCore in BiocParallel since a few weeks. bplapply worked fine before but stopped working in both R-3.4.3 (v.1.12.0) and R-3.5.1 (1.16.0). This seems to be related to socket use, since a simple

library(BiocParallel) a = list(A=1:10, B=2:200) bplapply(a, mean, MulticoreParam(workers=2))

hangs and using GDB reveals that it is blocking in sock_open(). Setting workers = 1 works fine.

I would try to reconfigure our server (Ubuntu 16.04, Dell 48core) if I only knew what to change - is there any documentation on how sockets should be configured to make BiocParallel work with the MultiCore backend ?

Thanks !

There are two additional parameters manager.hostname and manager.port that you could try to discern valid values of -- the hostname of the computer that you're running on, and an open port.

I am working on an alternative that uses local sockets that does not require open ports.

mtmorgan avatar Nov 26 '18 11:11 mtmorgan

Thanks, that solved it. For some strange reason traffic from my machine to localhost was routed through an external firewall, while traffic to 127.0.0.1 or using the machine name is not. Setting manager.hostname to the latter two worked.

The LocalParam branch implements the LocalParam() back-end to use 'local' (disk-based) sockets that do not require a port. This is intended to be a drop-in replacement for MulticoreParam.

BiocManager::install("Bioconductor/BiocParallel", ref = "LocalParam")

I'm aware of speed regression in some circumstances, but would welcome any other comments.

mtmorgan avatar Jan 25 '19 22:01 mtmorgan

Are file sockets really implemented on disk across the board? It would be great for it to be an in-memory stream.

lawremi avatar Jan 25 '19 23:01 lawremi

across which board? These are so-called 'local' sockets, and are file-based in the abstract unix sense; my understanding is that the file system is used as a name space, rather than communication medium.

mtmorgan avatar Jan 26 '19 01:01 mtmorgan

You had said the sockets were "disk-based" which made me a little worried, but it sounds like they are just file-based, so this is great news.

lawremi avatar Jan 26 '19 03:01 lawremi