ResistanceGA icon indicating copy to clipboard operation
ResistanceGA copied to clipboard

Error writing to connection - Linux/Ubuntu parallelization issue

Open julian-wittische opened this issue 4 years ago • 13 comments

We are having an error on several brand new Ubuntu servers with everything installed and updated, when we run the code with parallelization. This is the error we get:

Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal Error in serialize(data, node$con, xdr = FALSE) : error writing to connection

Here is a reproducible example code, copied from the tutorial, which triggers the issue with our particular setup:

write.dir <- #please fill here
library(ResistanceGA)
data(resistance_surfaces)
data(samples)
sample.locales <-SpatialPoints(samples[,c(2,3)])
r.stack <-stack(resistance_surfaces$categorical,resistance_surfaces$continuous,resistance_surfaces$feature)
GA.inputs <-GA.prep(ASCII.dir = r.stack,Results.dir = write.dir,method = "LL",max.cat = 500,max.cont = 500,seed = 555,parallel = 4)
gdist.inputs <-gdist.prep(length(sample.locales),samples = sample.locales,method ='commuteDistance')
PARM <-c(1, 250, 75, 1, 3.5, 150, 1, 350)
Resist <-Combine_Surfaces(PARM = PARM,gdist.inputs = gdist.inputs,GA.inputs = GA.inputs,out = NULL,rescale = TRUE)
gdist.response <-Run_gdistance(gdist.inputs = gdist.inputs,r = Resist)
gdist.inputs <-gdist.prep(n.Pops =length(sample.locales),samples = sample.locales,response =as.vector(gdist.response),method ='commuteDistance')
Multi.Surface_optim <-MS_optim(gdist.inputs = gdist.inputs,GA.inputs = GA.inputs)

Session info:

R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.2 LTS

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ResistanceGA_4.1-0.46 raster_3.4-10 sp_1.4-5

loaded via a namespace (and not attached): [1] jsonlite_1.7.2 splines_4.0.5 foreach_1.5.1 [4] gtools_3.8.2 shiny_1.6.0 expm_0.999-6 [7] stats4_4.0.5 spatstat.geom_2.1-0 LearnBayes_2.15.1 [10] pillar_1.6.1 lattice_0.20-44 glue_1.4.2 [13] digest_0.6.27 promises_1.2.0.1 polyclip_1.10-0 [16] minqa_1.2.4 colorspace_2.0-1 MuMIn_1.43.17 [19] htmltools_0.5.1.1 httpuv_1.6.1 Matrix_1.3-3 [22] plyr_1.8.6 spatstat.sparse_2.0-0 JuliaCall_0.17.4 [25] pkgconfig_2.0.3 gmodels_2.18.1 purrr_0.3.4 [28] xtable_1.8-4 spatstat.core_2.1-2 scales_1.1.1 [31] gdata_2.18.0 tensor_1.5 XR_0.7.2 [34] later_1.2.0 spatstat.utils_2.1-0 lme4_1.1-27 [37] proxy_0.4-25 tibble_3.1.2 mgcv_1.8-35 [40] generics_0.1.0 ggplot2_3.3.3 ellipsis_0.3.2 [43] XRJulia_0.9.0 cli_2.5.0 magrittr_2.0.1 [46] crayon_1.4.1 mime_0.10 deldir_0.2-10 [49] fansi_0.4.2 doParallel_1.0.16 nlme_3.1-152 [52] MASS_7.3-54 class_7.3-19 tools_4.0.5 [55] lifecycle_1.0.0 munsell_0.5.0 e1071_1.7-6 [58] gdistance_1.3-6 akima_0.6-2.1 compiler_4.0.5 [61] rlang_0.4.11 units_0.7-1 classInt_0.4-3 [64] grid_4.0.5 nloptr_1.2.2.2 iterators_1.0.13 [67] goftest_1.2-2 igraph_1.2.6 miniUI_0.1.1.1 [70] boot_1.3-28 GA_3.2.1 gtable_0.3.0 [73] codetools_0.2-18 abind_1.4-5 DBI_1.1.1 [76] R6_2.5.0 knitr_1.33 dplyr_1.0.6 [79] fastmap_1.1.0 utf8_1.2.1 ggExtra_0.9 [82] spdep_1.1-7 KernSmooth_2.23-20 spatstat.data_2.1-0 [85] parallel_4.0.5 Rcpp_1.0.6 vctrs_0.3.8 [88] sf_0.9-8 rpart_4.1-15 coda_0.19-4 [91] spData_0.3.8 tidyselect_1.1.1 xfun_0.23

We have tried reinstalling everything with different versions, to no avail. We have a very large RAM on both servers. A simple parallelization with doParallel works:

library(doParallel)  
getPrimeNumbers <- function(n) {  
   n <- as.integer(n)
   if(n > 1e6) stop("n too large")
   primes <- rep(TRUE, n)
   primes[1] <- FALSE
   last.prime <- 2L
   for(i in last.prime:floor(sqrt(n)))
   {
      primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
      last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
   }
   which(primes)
}
no_cores <- detectCores() - 1  
registerDoParallel(cores=no_cores)  
cl <- makeCluster(no_cores, type="FORK")  
result <- parLapply(cl, 10:10000, getPrimeNumbers)  
stopCluster(cl)

julian-wittische avatar May 24 '21 09:05 julian-wittische

It doesn’t look like you’ve specified the full path to the directory where you want results written.

  • Bill - On May 24, 2021, 05:14 -0400, Julian WITTISCHE @.***>, wrote:

We are having an error on several brand new Ubuntu servers with everything installed and updated, when we run the code with parallelization. This is the error we get: Error in serialize(data, node$con, xdr = FALSE) : ignoring SIGPIPE signal Error in serialize(data, node$con, xdr = FALSE) : error writing to connection Here is a reproducible example code, copied from the tutorial, which triggers the issue with our particular setup: write.dir <- #please fill here library(ResistanceGA) data(resistance_surfaces) data(samples) sample.locales <-SpatialPoints(samples[,c(2,3)]) r.stack <-stack(resistance_surfaces$categorical,resistance_surfaces$continuous,resistance_surfaces$feature) GA.inputs <-GA.prep(ASCII.dir = r.stack,Results.dir = write.dir,method = "LL",max.cat = 500,max.cont = 500,seed = 555,parallel = 4) gdist.inputs <-gdist.prep(length(sample.locales),samples = sample.locales,method ='commuteDistance') PARM <-c(1, 250, 75, 1, 3.5, 150, 1, 350) Resist <-Combine_Surfaces(PARM = PARM,gdist.inputs = gdist.inputs,GA.inputs = GA.inputs,out = NULL,rescale = TRUE) gdist.response <-Run_gdistance(gdist.inputs = gdist.inputs,r = Resist) gdist.inputs <-gdist.prep(n.Pops =length(sample.locales),samples = sample.locales,response =as.vector(gdist.response),method ='commuteDistance') Multi.Surface_optim <-MS_optim(gdist.inputs = gdist.inputs,GA.inputs = GA.inputs) Session info: R version 4.0.5 (2021-03-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.2 LTS attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ResistanceGA_4.1-0.46 raster_3.4-10 sp_1.4-5 loaded via a namespace (and not attached): [1] jsonlite_1.7.2 splines_4.0.5 foreach_1.5.1 [4] gtools_3.8.2 shiny_1.6.0 expm_0.999-6 [7] stats4_4.0.5 spatstat.geom_2.1-0 LearnBayes_2.15.1 [10] pillar_1.6.1 lattice_0.20-44 glue_1.4.2 [13] digest_0.6.27 promises_1.2.0.1 polyclip_1.10-0 [16] minqa_1.2.4 colorspace_2.0-1 MuMIn_1.43.17 [19] htmltools_0.5.1.1 httpuv_1.6.1 Matrix_1.3-3 [22] plyr_1.8.6 spatstat.sparse_2.0-0 JuliaCall_0.17.4 [25] pkgconfig_2.0.3 gmodels_2.18.1 purrr_0.3.4 [28] xtable_1.8-4 spatstat.core_2.1-2 scales_1.1.1 [31] gdata_2.18.0 tensor_1.5 XR_0.7.2 [34] later_1.2.0 spatstat.utils_2.1-0 lme4_1.1-27 [37] proxy_0.4-25 tibble_3.1.2 mgcv_1.8-35 [40] generics_0.1.0 ggplot2_3.3.3 ellipsis_0.3.2 [43] XRJulia_0.9.0 cli_2.5.0 magrittr_2.0.1 [46] crayon_1.4.1 mime_0.10 deldir_0.2-10 [49] fansi_0.4.2 doParallel_1.0.16 nlme_3.1-152 [52] MASS_7.3-54 class_7.3-19 tools_4.0.5 [55] lifecycle_1.0.0 munsell_0.5.0 e1071_1.7-6 [58] gdistance_1.3-6 akima_0.6-2.1 compiler_4.0.5 [61] rlang_0.4.11 units_0.7-1 classInt_0.4-3 [64] grid_4.0.5 nloptr_1.2.2.2 iterators_1.0.13 [67] goftest_1.2-2 igraph_1.2.6 miniUI_0.1.1.1 [70] boot_1.3-28 GA_3.2.1 gtable_0.3.0 [73] codetools_0.2-18 abind_1.4-5 DBI_1.1.1 [76] R6_2.5.0 knitr_1.33 dplyr_1.0.6 [79] fastmap_1.1.0 utf8_1.2.1 ggExtra_0.9 [82] spdep_1.1-7 KernSmooth_2.23-20 spatstat.data_2.1-0 [85] parallel_4.0.5 Rcpp_1.0.6 vctrs_0.3.8 [88] sf_0.9-8 rpart_4.1-15 coda_0.19-4 [91] spData_0.3.8 tidyselect_1.1.1 xfun_0.23 We have tried reinstalling everything with different versions, to no avail. We have a very large RAM on both servers. A simple parallelization with doParallel works: library(doParallel) getPrimeNumbers <- function(n) { n <- as.integer(n) if(n > 1e6) stop("n too large") primes <- rep(TRUE, n) primes[1] <- FALSE last.prime <- 2L for(i in last.prime:floor(sqrt(n))) { primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE last.prime <- last.prime + min(which(primes[(last.prime+1):n])) } which(primes) } no_cores <- detectCores() - 1 registerDoParallel(cores=no_cores) cl <- makeCluster(no_cores, type="FORK") result <- parLapply(cl, 10:10000, getPrimeNumbers) stopCluster(cl) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

wpeterman avatar May 25 '21 16:05 wpeterman

I did store an appropriate path in the write.dir object, I simply got rid of it if someone wanted to run it. The script works in Windows.

julian-wittische avatar May 26 '21 02:05 julian-wittische

Hello - I am receiving the same error, also using a server with ubuntu

Did you find a fix for this @julian-wittische? - thanks I have googled the general error and some seem to think it is the amount of memory it will use? However I am running it on a server with many CPUs so I dont think this should be a problem

EveTC avatar May 29 '21 11:05 EveTC

Hello,

Unfortunately, we have been unable to solve this issue. The memory is not the issue in our case because even using only 2 CPU with huge RAM triggers the error.

julian-wittische avatar May 31 '21 08:05 julian-wittische

Unfortunately I do not have convenient access to an Ubuntu/Linux machine to troubleshoot this issue. Have you confirmed that you can write results to your specified directory when running a simple example?

On Mon, May 31, 2021 at 4:18 AM Julian WITTISCHE @.***> wrote:

Hello,

Unfortunately, we have been unable to solve this issue. The memory is not the issue in our case because even using only 2 CPU with huge RAM triggers the error.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wpeterman/ResistanceGA/issues/17#issuecomment-851305531, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDLQUIGLR2N444INFV2H4LTQNA6VANCNFSM45M3MEPA .

--

-Bill-

wpeterman avatar Jun 01 '21 17:06 wpeterman

Yes everything writes to the correct and specified directory when I run a small example without the parrallel setting.

EveTC avatar Jun 07 '21 09:06 EveTC

As I can not seem to solve the parallel issue. Is it possible to run all the rasters for SS_optim seperatley (i.e. on seperate cores) and then concatenate the results for the pseudo bootsrapping method? I am running it for a big area so am trying to find any way to speed up the process.

Or can we use doParallel around the function itself somehow? Sorry I am very new to using parallel in R etc.

Thank you

EveTC avatar Jun 18 '21 09:06 EveTC

Running in parallel reduces the time to optimize a single surface. It is entirely possible to optimize each surface without parallelization, but this will be extremely time consuming.

On Fri, Jun 18, 2021 at 5:20 AM EveTC @.***> wrote:

As I can not seem to solve the parallel issue. Is it possible to run all the rasters for SS_optim seperatley and then concatenate the results for the pseudo bootsrapping method? I am running it for a big area so am trying to find any way to speed up the process. Thank you

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/wpeterman/ResistanceGA/issues/17#issuecomment-863893968, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDLQUMLZ5ZDKUQQJZZPHTDTTMFVDANCNFSM45M3MEPA .

--

-Bill-

wpeterman avatar Jun 18 '21 13:06 wpeterman

Ok thanks @wpeterman. I have found this chain (https://github.com/luca-scr/GA/issues/50 - see answer to the issue at the end) on the GA GitHub (@julian-wittische) which may help us debug the issue?

I have never used parallel in R but once I set

cl <- makePSOCKcluster(8) # I defined cl by this commend
registerDoParallel(cl)

I no longer get the previous error but then I do not recieve the normal iteration ouptut so I am unsure if it is working.

I am going to play with this today and let you know how it goes, but if you have any success with this way forward - please let me know.

EveTC avatar Jun 21 '21 09:06 EveTC

I believe it now works for me. I run the code below:

library(parallel)
library(doParallel)

cl <- makePSOCKcluster(32)
registerDoParallel(cl)

# Set variables for ResistanceGA
GA.inputs_All <- GA.prep(method="AIC", ASCII.dir=raster, Results.dir = write.dir, min.cat=1, seed=111, parallel=cl)
# Inputs for resistance method
gdist.inputs <- gdist.prep(length(sample.sp), samples=sample.sp, response= lower(fst), method='costDistance')

# Export info to cluster
clusterExport(cl=cl,varlist=c("GA.inputs_All","gdist.inputs","raster","sample.sp","fst")) # list everything you call in ro GA.inputs and gdist
clusterEvalQ(cl=cl, .libPaths("/R")) # set path to where your R library is
clusterCall(cl=cl, library, package = "ResistanceGA", character.only = TRUE)
 
# Run SS_optim
run1_SSoptim <- SS_optim(gdist.inputs = gdist.inputs, GA.inputs = GA.inputs_All, diagnostic_plots=FALSE)

# Stop cluster once it has finished
stopCluster(cl)

EveTC avatar Jun 21 '21 10:06 EveTC

Has this issue been officially resolved? I'm running into the same output errors when I run my code in an ubuntu EC2 instance.

Error in unserialize(socklist[[n]]) : error reading from connection

cmu002 avatar Apr 18 '22 06:04 cmu002

This was an idiosyncratic error that I could never recreate on clusters or computers I had access to. If you're receiving an error when running ResistanceGA with Julia, try following suggestion from Julian and Eve above.

wpeterman avatar Apr 20 '22 11:04 wpeterman