BiocParallel icon indicating copy to clipboard operation
BiocParallel copied to clipboard

memory efficient parallel compting

Open ArthurPERE opened this issue 9 months ago • 4 comments

Hello,

I have a question about the memory of the parallel. I am on linux so the parameter for the parallelism is MulticoreParam.

Each time a session spawn, the whole memory is copied to this session (at least that what I understand). But I don't want that the whole memory of the session to be copied. Is there a trick to be used to not do that, for exemple spawning a whole new session (even if it means putting aside a little performance), detached from the inital session

Thank you for your answer

ArthurPERE avatar Apr 16 '25 08:04 ArthurPERE

And is there an outbreak for the problem describe here : https://support.bioconductor.org/p/70196/#70509

I tried to write the function with gc() at the end, but to no avail, there is too much data in the session after the loop is finished.

ArthurPERE avatar Apr 16 '25 08:04 ArthurPERE

See the note on copy-on-change.

SharedObject may also be relevant.

Apropos the support ticket you reference, please try to be more specific. Also indicate the result of sessionInfo() and whether you are working within Rstudio or in some other IDE.

vjcitn avatar Apr 16 '25 09:04 vjcitn

I am working in Rstudio, but I tried it inline with Rscript, and my computer was strugling in term of memory (with RAM 96 Go, and 48 CPUs). And it was like in Rstudio.

And this is my code :

  Results.1.all =
    BiocParallel::bplapply(split(all_expermentation, seq(nrow(all_expermentation))),
      function(value, raw_counts, sub_set){
        
        experiment = NULL
        try(experiment <- coseq::coseq(
          raw_counts, K = value$k, model = "Normal", normFactors="TMM", transformation="arcsin",
          subset=sub_set, seed=6798+value$k+value$a-1))
        gc()
        return(experiment)
      }, raw_counts = Raw_Counts, sub_set = subset, BPOPTIONS = BiocParallel::bpoptions(packages = "coseq")
    )

And this is my sessionInfo() :

sessionInfo()
R version 4.4.3 (2025-02-28)
Platform: x86_64-pc-linux-gnu
Running under: Debian GNU/Linux 12 (bookworm)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.11.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.11.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8     LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Paris
tzcode source: system (glibc)

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] devtools_2.4.5              usethis_3.1.0               ggplotify_0.1.2             pdftools_3.5.0              UpSetR_1.4.0                data.table_1.16.4          
 [7] coseq_1.30.0                SummarizedExperiment_1.34.0 Biobase_2.64.0              GenomicRanges_1.58.0        GenomeInfoDb_1.40.1         IRanges_2.38.1             
[13] S4Vectors_0.42.1            BiocGenerics_0.50.0         MatrixGenerics_1.16.0       matrixStats_1.5.0           RColorBrewer_1.1-3          dplyr_1.1.4                
[19] plyr_1.8.9                  ggpubr_0.6.0                reshape2_1.4.4              gplots_3.2.0                FactoMineR_2.11             ggplot2_3.5.1              
[25] edgeR_4.2.2                 limma_3.60.6                tictoc_1.2.1               

loaded via a namespace (and not attached):
  [1] tensorA_0.36.2.1        capushe_1.1.2           rstudioapi_0.17.1       jsonlite_1.8.9          magrittr_2.0.3          estimability_1.5.1     
  [7] farver_2.1.2            corrplot_0.95           ragg_1.3.3              fs_1.6.5                zlibbioc_1.50.0         vctrs_0.6.5            
 [13] memoise_2.0.1           askpass_1.2.1           rstatix_0.7.2           htmltools_0.5.8.1       S4Arrays_1.4.1          plotrix_3.8-4          
 [19] broom_1.0.7             SparseArray_1.4.8       Formula_1.2-5           gridGraphics_0.5-1      KernSmooth_2.23-26      htmlwidgets_1.6.4      
 [25] emmeans_1.11.0          cachem_1.1.0            mime_0.12               lifecycle_1.0.4         pkgconfig_2.0.3         Matrix_1.7-3           
 [31] R6_2.5.1                fastmap_1.2.0           GenomeInfoDbData_1.2.12 shiny_1.10.0            digest_0.6.37           HTSFilter_1.44.0       
 [37] colorspace_2.1-1        DESeq2_1.44.0           pkgload_1.4.0           HTSCluster_2.0.11       textshaping_1.0.0       ellipse_0.5.0          
 [43] labeling_0.4.3          httr_1.4.7              abind_1.4-8             compiler_4.5.0          remotes_2.5.0           proxy_0.4-27           
 [49] withr_3.0.2             backports_1.5.0         BiocParallel_1.38.0     carData_3.0-5           pkgbuild_1.4.6          ggsignif_0.6.4         
 [55] MASS_7.3-65             bayesm_3.1-6            sessioninfo_1.2.3       DelayedArray_0.30.1     scatterplot3d_0.3-44    gtools_3.9.5           
 [61] caTools_1.18.3          flashClust_1.01-2       tools_4.5.0             httpuv_1.6.15           glue_1.8.0              promises_1.3.2         
 [67] grid_4.5.0              cluster_2.1.8.1         generics_0.1.3          gtable_0.3.6            class_7.3-23            tidyr_1.3.1            
 [73] car_3.1-3               XVector_0.44.0          ggrepel_0.9.6           pillar_1.10.1           stringr_1.5.1           yulab.utils_0.2.0      
 [79] later_1.4.1             splines_4.5.0           robustbase_0.99-4-1     lattice_0.22-7          compositions_2.0-8      tidyselect_1.2.1       
 [85] locfit_1.5-9.12         miniUI_0.1.1.1          gridExtra_2.3           statmod_1.5.0           DEoptimR_1.1-3-1        DT_0.33                
 [91] stringi_1.8.4           UCSC.utils_1.0.0        qpdf_1.3.5              codetools_0.2-20        tibble_3.2.1            multcompView_0.1-10    
 [97] cli_3.6.3               systemfonts_1.2.1       xtable_1.8-4            munsell_0.5.1           Rcpp_1.0.14             leaps_3.2              
[103] ellipsis_0.3.2          Rmixmod_2.1.10          profvis_0.4.0           urlchecker_1.0.1        bitops_1.0-9            mvtnorm_1.3-3          
[109] scales_1.3.0            e1071_1.7-16            purrr_1.0.4             crayon_1.5.3            rlang_1.1.5

ArthurPERE avatar Apr 16 '25 10:04 ArthurPERE

Have you looked at ?multicoreWorkers

 workers: 'integer(1)' Number of workers. Defaults to the maximum of 1
          or the number of cores determined by 'detectCores' minus 2
          unless environment variables
          'R_PARALLELLY_AVAILABLECORES_FALLBACK' or
          'BIOCPARALLEL_WORKER_NUMBER' are set otherwise.

With 48 cores, it is possible that 46 workers are activated by default. Even if your base process is modest in size you will replicate it by the number of workers. and 96Gb will not be enough. You should see if cutting the worker number to 10 or so is is tolerable from an overall RAM perspective. If the throughput is not acceptable then other strategies can be considered. Let us know. (If you already tried cutting down from the default number of workers assigned, then this suggestion is nugatory.)

vjcitn avatar Apr 16 '25 11:04 vjcitn