methylseq icon indicating copy to clipboard operation
methylseq copied to clipboard

Pipeline requires 72GB of RAM, even to test.

Open mpiersonsmela opened this issue 1 year ago • 3 comments

Description of the bug

On my university's cluster, users are penalized (with priority reduction) for requesting more RAM than they actually use. So the fact that the pipeline requires at least 72GB of RAM to run is an issue for me, given than I'm just trying to test it with the example samplesheet.csv from https://nf-co.re/methylseq/2.6.0/

This is the relevant portion of the output. Does bismark genome preparation really need so much RAM?

`ERROR ~ Error executing process > 'NFCORE_METHYLSEQ:METHYLSEQ:PREPARE_GENOME:BISMARK_GENOMEPREPARATION (BismarkIndex/grch38_core+bs_controls.fa)'

Caused by: Process requirement exceeds available memory -- req: 72 GB; avail: 32 GB

Command executed:

bismark_genome_preparation
--bowtie2
BismarkIndex`

Command used and terminal output

nextflow run nf-core/methylseq \
--input test_samplesheet.csv \
--outdir Output \
--fasta grch38_core+bs_controls.fa \
-w /n/scratch/users/m/NF_MiSeq \
-ansi-log false

Relevant files

No response

System information

No response

mpiersonsmela avatar Jul 26 '24 00:07 mpiersonsmela

@mpiersonsmela

I've tested the pipeline and my nextflow report shows high RAM usage particularly by the deduplication step. I'm not sure if it's optimal but hope it helps Screenshot 2024-08-07 at 14 12 26

imdanique avatar Aug 07 '24 11:08 imdanique

its true that it requires 72.GB mem as the process is labelled with process_high with config set in base.config.

I can limit the max mem for the test_full profile but, if any other changes you have to make as per your resource availability by setting institutional cluster specific config settings. Does that sound ok to you ?

sateeshperi avatar Sep 17 '24 17:09 sateeshperi

Hi @mpiersonsmela, it’s true that the test_full profile needs 72 GB of RAM since we’re testing real-life samples. However, the test profile requires only 4 GB of RAM. So, if you’re just testing the pipeline setup, use the test profile. If you want to test with a real-sized dataset, you can try test_full, which does require high memory to process these samples.

If convinced with the answer, kindly close this issue. Thank you!

sateeshperi avatar Oct 27 '24 14:10 sateeshperi

@mpiersonsmela you can also customize the cpu & mem requirements using withName selector in you institutional config. Also check out new resourceLimits directive in nextflow

sateeshperi avatar Dec 14 '24 09:12 sateeshperi