bactopia icon indicating copy to clipboard operation
bactopia copied to clipboard

[question] Errors running bactopia singularity container in slurm cluster (ComputeCanada)

Open azmigueldario opened this issue 3 years ago • 5 comments

Hello

I have been trying to run the bactopia pipeline from the singularity (supported in my cluster) container pulled from quay.io.

I managed to download the datasets, which worked only by running the container with a clean environment singularity exec -e ...

My first try with the main bactopia pipeline failed to execute singularity from inside the container to pull all the necessary tools from the registry. I got them outside the pipeline after failing to adjust it. The file of filenames check did not report an issue

Now, I am trying to get the main pipeline to run again and there seems to be an error in submitting jobs.

#!/bin/bash
#SBATCH --account=XXXXXXXXX
#SBATCH --mem-per-cpu=4G #  GB of memory per cpu core
#SBATCH --time=01:00:00
#SBATCH --ntasks=4 # tasks in parallel / number of cores
#SBATCH --cpus-per-task=1 # number of cores per task
#SBATCH --job-name="main_workflow_bactopia"
#SBATCH --chdir=/scratch/mdprieto/
#SBATCH --output=./temp_results/%j_main_bactopia_nov26.out

################################## preparation #########################################

# load singularity
module load singularity
odule load nextflow

# mount my filesystem inside container, localscratch allows job to use compute node temp folder
BIND_MOUNT="-B /home,/project,/scratch,/localscratch,/localscratch:/temp,/opt,/cvmfs"

# git directory with input variables
kleb_git="/home/mdprieto/git/klebsiella/input"

# make output directory if necessary\
mkdir -p /scratch/mdprieto/temp_results/bactopia_output/

# define new temp folders for singularity
mkdir -p /scratch/$USER/singularity/tmp
export SINGULARITY_CACHEDIR="/project/6007413/cidgoh_share/singularity_imgs"
export SINGULARITY_TMPDIR="/scratch/$USER/singularity/tmp"

# export PATH to run singularity to container
export SINGULARITYENV_APPEND_PATH=$PATH

################################## BACTOPIA  #########################################

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia \
    --samples $kleb_git/kleb_qatar_fofn.txt \
    --datasets /scratch/mdprieto/datasets \
    --outdir /scratch/mdprieto/temp_results/bactopia_output/ \
    --species "Klebsiella pneumoniae" \
    --genome_size median \
    --singularity_cache $SINGULARITY_CACHEDIR \
    --max_cpus 2 \
    --verbose \
    -profile slurm,singularity \
    -resume

I have tried to change the -profile to slurm or singularity alone too with the same results. Also tried to run with more memory just in case.

My error is that the command fails to run BACTOPIA: GATHER SAMPLES repeatedly. I am not that knowledgeable in nextflow yet so I am struggling to troubleshoot, any suggestions are welcome.

Thanks

2022-11-27 13:32:48:root:STDERR - 
2022-11-27 13:32:48:root:INFO - Checking if environment pre-builds are needed
2022-11-27 13:32:48:root:DEBUG - Working on bactopia
2022-11-27 13:32:48:root:INFO - Found Singularity images in /project/6007413/cidgoh_share/singularity_imgs, if a complete rebuild is needed please use --force_rebuild
2022-11-27 13:32:48:root:DEBUG - Existing image (/project/6007413/cidgoh_share/singularity_imgs/quay.io-bactopia-annotate_genome-2.1.1.img) found, skipping unless --force is used
2022-11-27 13:32:48:root:DEBUG - Existing image (/project/6007413/cidgoh_share/singularity_imgs/quay.io-bactopia-assemble_genome-2.1.1.img) found, skipping unless --force is used
2022-11-27 13:32:48:root:DEBUG - Existing image (/project/6007413/cidgoh_share/singularity_imgs/quay.io-bactopia-assembly_qc-2.1.1.img) found, skipping unless --force is used
2022-11-27 13:32:49:root:DEBUG - Existing image (/project/6007413/cidgoh_share/singularity_imgs/quay.io-bactopia-call_variants-2.1.1.img) found, skipping unless --force is used
2022-11-27 13:32:49:root:DEBUG - Existing image (/project/6007413/cidgoh_share/singularity_imgs/quay.io-bactopia-gather_samples-2.1.1.img) found, skipping unless --force is used
2022-11-27 13:32:49:root:DEBUG - Existing image (/project/6007413/cidgoh_share/singularity_imgs/quay.io-bactopia-minmers-2.1.1.img) found, skipping unless --force is used
2022-11-27 13:32:49:executor.process:DEBUG - Executing external command: bash -c 'date > /project/6007413/cidgoh_share/singularity_imgs/quay.io-images-built-2.1.1.txt'
2022-11-27 13:32:49:executor.process:DEBUG - Constructing subprocess.Popen object ..
2022-11-27 13:32:49:executor.process:DEBUG - Joining synchronous process using subprocess.Popen.communicate() ..
2022-11-27 13:32:50:executor.process:DEBUG - Got return code 0 from synchronous process (bash -c 'date > /project/6007413/cidgoh_share/singularity_imgs/quay.io-images-built-2.1.1.txt').
2022-11-27 13:32:50:root:STDOUT - 
2022-11-27 13:32:50:root:STDERR - 
2022-11-27 13:32:50:root:DEBUG - Working on bactopia
2022-11-27 13:32:50:root:DEBUG - Found Singularity image /project/6007413/cidgoh_share/singularity_imgs/depot.galaxyproject.org-singularity-multiqc-1.11--pyhdfd78af_0.img, if a complete rebuild is needed please use --force_rebuild
2022-11-27 13:32:50:root:DEBUG - Working on bactopia
2022-11-27 13:32:50:root:DEBUG - Found Singularity image /project/6007413/cidgoh_share/singularity_imgs/depot.galaxyproject.org-singularity-csvtk-0.23.0--h9ee0642_0.img, if a complete rebuild is needed please use --force_rebuild
N E X T F L O W  ~  version 22.04.0
Launching `/usr/local/share/bactopia-2.1.x/main.nf` [insane_wescoff] DSL2 - revision: 145bb11899


---------------------------------------------
   _                _              _             
  | |__   __ _  ___| |_ ___  _ __ (_) __ _       
  | '_ \ / _` |/ __| __/ _ \| '_ \| |/ _` |   
  | |_) | (_| | (__| || (_) | |_) | | (_| |      
  |_.__/ \__,_|\___|\__\___/| .__/|_|\__,_| 
                            |_|                  
  bactopia v2.1.1
  Bactopia is a flexible pipeline for complete analysis of bacterial genomes. 
---------------------------------------------
Core Nextflow options
  runName          : insane_wescoff
  containerEngine  : singularity
  container        : quay.io/bactopia/bactopia:2.1.1
  launchDir        : /scratch/mdprieto
  workDir          : /scratch/mdprieto/work
  projectDir       : /usr/local/share/bactopia-2.1.x
  userName         : mdprieto
  profile          : slurm,singularity
  configFiles      : /usr/local/share/bactopia-2.1.x/nextflow.config

Required Parameters
  samples          : /home/mdprieto/git/klebsiella_Qatar_2022/input/kleb_qatar_fofn.txt

Dataset Parameters
  datasets         : /scratch/mdprieto/datasets
  species          : Klebsiella pneumoniae
  genome_size      : median

Optional Parameters
  outdir           : /scratch/mdprieto/temp_results/bactopia_output/

Max Job Request Parameters
  max_cpus         : 2

Nextflow Profile Parameters
  condadir         : /usr/local/share/bactopia-2.1.x/conda/envs
  registry         : quay
  singularity_cache: /project/6007413/cidgoh_share/singularity_imgs

!! Only displaying parameters that differ from the pipeline defaults !!
--------------------------------------------------------------------
If you use bactopia for your analysis please cite:

* Bactopia
  https://doi.org/10.1128/mSystems.00190-20

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://bactopia.github.io/acknowledgements/
--------------------------------------------------------------------
Found 1 Antimicrobial resistance datasets
	/scratch/mdprieto/datasets/antimicrobial-resistance/amrfinderdb.tar.gz
Found 4 minmer sketches/signatures
	/scratch/mdprieto/datasets/minmer/mash-refseq-k21.msh
	/scratch/mdprieto/datasets/minmer/sourmash-genbank-k21.json.gz
	/scratch/mdprieto/datasets/minmer/sourmash-genbank-k31.json.gz
	/scratch/mdprieto/datasets/minmer/sourmash-genbank-k51.json.gz
Found Prokka proteins file
	/scratch/mdprieto/datasets/species-specific/klebsiella-pneumoniae/annotation/klebsiella-pneumoniae.faa
Found Mash Sketch of auto variant calling
	/scratch/mdprieto/datasets/species-specific/klebsiella-pneumoniae/minmer/refseq-genomes.msh
Found 1 MLST datasets
	/scratch/mdprieto/datasets/species-specific/klebsiella-pneumoniae/mlst/default.tar.gz
Found 1 reference genomes
	/scratch/mdprieto/datasets/species-specific/klebsiella-pneumoniae/minmer/refseq-genomes.msh
Will use 5650579 bp for genome size

If something looks wrong, now's your chance to back out (CTRL+C 3 times). 
Sleeping for 5 seconds...
--------------------------------------------------------------------
[-        ] process > BACTOPIA:GATHER_SAMPLES -
...TRUNCATED ...
(CP19_S19_L001)' for execution -- Execution is retried (1)
slurmstepd: error: *** JOB 51447211 ON cdr535 CANCELLED AT 2022-11-27T14:33:02 DUE TO TIME LIMIT ***
[-        ] process > BACTOPIA:GATHER_SAMPLES -

[-        ] process > BACTOPIA:GATHER_SAMPLES        -
[-        ] process > BACTOPIA:QC_READS              -
[-        ] process > BACTOPIA:ASSEMBLE_GENOME       -
[-        ] process > BACTOPIA:ASSEMBLY_QC           -
[-        ] process > BACTOPIA:ANNOTATE_GENOME       -
[-        ] process > BACTOPIA:MINMER_SKETCH         -
[-        ] process > BACTOPIA:ANTIMICROBIAL_RESI... -
[-        ] process > BACTOPIA:MINMER_QUERY          -
[-        ] process > BACTOPIA:BLAST                 -
[-        ] process > BACTOPIA:CALL_VARIANTS         -
[-        ] process > BACTOPIA:MAPPING_QUERY         -
[-        ] process > BACTOPIA:SEQUENCE_TYPE         -
[-        ] process > BACTOPIA:CUSTOM_DUMPSOFTWAR... -

[-        ] process > BACTOPIA:GATHER_SAMPLES        [  0%] 0 of 4
[-        ] process > BACTOPIA:QC_READS              -
[-        ] process > BACTOPIA:ASSEMBLE_GENOME       -
[-        ] process > BACTOPIA:ASSEMBLY_QC           -
[-        ] process > BACTOPIA:ANNOTATE_GENOME       -
[-        ] process > BACTOPIA:MINMER_SKETCH         -
[-        ] process > BACTOPIA:ANTIMICROBIAL_RESI... -
[-        ] process > BACTOPIA:MINMER_QUERY          -
[-        ] process > BACTOPIA:BLAST                 -
[-        ] process > BACTOPIA:CALL_VARIANTS         -
[-        ] process > BACTOPIA:MAPPING_QUERY         -
[-        ] process > BACTOPIA:SEQUENCE_TYPE         -
[-        ] process > BACTOPIA:CUSTOM_DUMPSOFTWAR... -

[66/237942] process > BACTOPIA:GATHER_SAMPLES (C1... [ 22%] 2 of 9, failed: 2...
[-        ] process > BACTOPIA:QC_READS              -
[-        ] process > BACTOPIA:ASSEMBLE_GENOME       -
[-        ] process > BACTOPIA:ASSEMBLY_QC           -
[-        ] process > BACTOPIA:ANNOTATE_GENOME       -
[-        ] process > BACTOPIA:MINMER_SKETCH         -
[-        ] process > BACTOPIA:ANTIMICROBIAL_RESI... -
[-        ] process > BACTOPIA:MINMER_QUERY          -
[-        ] process > BACTOPIA:BLAST                 -
[-        ] process > BACTOPIA:CALL_VARIANTS         -
[-        ] process > BACTOPIA:MAPPING_QUERY         -
[-        ] process > BACTOPIA:SEQUENCE_TYPE         -
[-        ] process > BACTOPIA:CUSTOM_DUMPSOFTWAR... -
[50/00f728] NOTE: Error submitting process 'BACTOPIA:GATHER_SAMPLES (C12_S22_L001)' for execution -- Execution is retried (1)
[66/237942] NOTE: Error submitting process 'BACTOPIA:GATHER_SAMPLES (C19_S18_L001)' for execution -- Execution is retried (1)

[37/f4ee7c] process > BACTOPIA:GATHER_SAMPLES (CP... [ 17%] 3 of 17, failed: ...
[-        ] process > BACTOPIA:QC_READS              -
[-        ] process > BACTOPIA:ASSEMBLE_GENOME       -
[-        ] process > BACTOPIA:ASSEMBLY_QC           -
[-        ] process > BACTOPIA:ANNOTATE_GENOME       -
[-        ] process > BACTOPIA:MINMER_SKETCH         -
[-        ] process > BACTOPIA:ANTIMICROBIAL_RESI... -
[-        ] process > BACTOPIA:MINMER_QUERY          -
[-        ] process > BACTOPIA:BLAST                 -
[-        ] process > BACTOPIA:CALL_VARIANTS         -
[-        ] process > BACTOPIA:MAPPING_QUERY         -
[-        ] process > BACTOPIA:SEQUENCE_TYPE         -
[-        ] process > BACTOPIA:CUSTOM_DUMPSOFTWAR... -
[50/00f728] NOTE: Error submitting process 'BACTOPIA:GATHER_SAMPLES (C12_S22_L001)' for execution -- Execution is retried (1)
[66/237942] NOTE: Error submitting process 'BACTOPIA:GATHER_SAMPLES (C19_S18_L001)' for execution -- Execution is retried (1)
[37/f4ee7c] NOTE: Error submitting process 'BACTOPIA:GATHER_SAMPLES (CP19_S19_L001)' for execution -- Execution is retried (1)

[37/f4ee7c] process > BACTOPIA:GATHER_SAMPLES (CP... [ 13%] 3 of 23, failed: ...
[-        ] process > BACTOPIA:QC_READS              -
[-        ] process > BACTOPIA:ASSEMBLE_GENOME       -
[-        ] process > BACTOPIA:ASSEMBLY_QC           -
[-        ] process > BACTOPIA:ANNOTATE_GENOME       -
[-        ] process > BACTOPIA:MINMER_SKETCH         -
[-        ] process > BACTOPIA:ANTIMICROBIAL_RESI... -
[-        ] process > BACTOPIA:MINMER_QUERY          -
[-        ] process > BACTOPIA:BLAST                 -
[-        ] process > BACTOPIA:CALL_VARIANTS         -
[-        ] process > BACTOPIA:MAPPING_QUERY         -
[-        ] process > BACTOPIA:SEQUENCE_TYPE         -
[-        ] process > BACTOPIA:CUSTOM_DUMPSOFTWAR... -
[50/00f728] NOTE: Error submitting process 'BACTOPIA:GATHER_SAMPLES (C12_S22_L001)' for execution -- Execution is retried (1)
[66/237942] NOTE: Error submitting process 'BACTOPIA:GATHER_SAMPLES (C19_S18_L001)' for execution -- Execution is retried (1)
[37/f4ee7c] NOTE: Error submitting process 'BACTOPIA:GATHER_SAMPLES (CP19_S19_L001)' for execution -- Execution is retried (1)

azmigueldario avatar Nov 29 '22 19:11 azmigueldario

@azmigueldario let's see if we can get this figured out! I think our issue might be singularity within singularity here.

By chance can you install Bactopia through Conda? We will not use Conda, but use it to let Nextflow handle the job submissions to your SLURM cluster.

Here's what I'm thinking:

# Head Node
bactopia -profile test,slurm \
    --slurm_opts="--account=XXXXXXXXX" \
    --slurm_queue "YOUR_QUEUE_NAME"

# Nextflow then submits jobs to the SLURM cluster

If you are interested, I think its worth considering the creation of a profile config file. Here's an example of one I use for a cluster here in Wyoming: https://github.com/bactopia/bactopia/blob/master/conf/profiles/arcc.config

This allows me to just add -profile arcc and Nextflow handles all the job submissions to the cluster.

rpetit3 avatar Nov 29 '22 20:11 rpetit3

Thank you @rpetit3.

I cannot use conda in the cluster. Although I have access to another one where it can be used. That is one of my alternatives

I will try to run your code and dig into the config file and follow-up.

azmigueldario avatar Nov 30 '22 01:11 azmigueldario

No problem. I noticed in your sbatch above there was a module load nextflow.

One thing you can try to do is replacing

singularity exec -e $BIND_MOUNT bactopia_2.1.1.sif bactopia \

with

nextflow run bactopia/bactopia

This might get you past the singularity in singularity bit

rpetit3 avatar Nov 30 '22 01:11 rpetit3

Hello again @rpetit3 ,

If I run it from nextflow directly and specify my container:

nextflow run bactopia/bactopia -with-singularity bactopia_2.1.1.sif ...

it runs the new version (2.2.0) of the app from the repo and ignores my container. It runs into an error while downloading the required modules for the updated bactopia.

Pulling Singularity image docker://quay.io/bactopia/gather_samples:2.2.0 [cache /project/6007413/cidgoh_share/singularity_imgs/quay.io-bactopia-gather_samples-2.2.0.img]


    Bactopia Execution Summary
    ---------------------------
    Bactopia Version : 2.2.0
    Nextflow Version : 22.04.3
    Command Line     : nextflow run bactopia/bactopia -with-singularity bactopia_2.1.1.sif --samples /home/mdprieto/git/klebsiella_Qatar_2022/input/kleb_qatar_fofn.txt --datasets /scratch/mdprieto/datasets --outdir /scratch/mdprieto/temp_results/bactopia_output/ --species 'Klebsiella pneumoniae' --genome_size median --singularity_cache /project/6007413/cidgoh_share/singularity_imgs --max_cpus 2 --verbose -profile slurm,singularity -resume
    Resumed          : true
    Completed At     : 2022-12-01T16:45:00.201158-08:00
    Duration         : 4m 13s
    Success          : false
    Exit Code        : null
    Error Report     : Error executing process > 'BACTOPIA:GATHER_SAMPLES (C12_S22_L001)'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name quay.io-bactopia-gather_samples-2.2.0.img.pulling.1669941682580 docker://quay.io/bactopia/gather_samples:2.2.0 > /dev/null
  status : 255
  message:
    INFO:    Converting OCI blobs to SIF format
    WARNING: 'nodev' mount option set on /scratch, it could be a source of failure during build process
    INFO:    Starting build...
    Getting image source signatures
    FATAL:   While making image from oci registry: error fetching image to cache: while building SIF from layers: conveyor failed to get: initializing source oci:/project/6007413/cidgoh_share/singularity_imgs/cache/blob:fa8a02b24c2e8e6f2326c1a63d535a3a58d5261c0ead5afc97c05950a0dd38aa: reading blob sha256:eaead16dc43bb8811d4ff450935d607f9ba4baffda4fc110cc402fa43f601d83: Get "https://cdn02.quay.io/sha256/ea/eaead16dc43bb8811d4ff450935d607f9ba4baffda4fc110cc402fa43f601d83?username=None&namespace=bactopia&Expires=1669942286&Signature=PASLAAFLQt~oW2Qphf6pSl0PK8kTFLhoStmSOf8KxXI6nhW1AkTpGXEvK1~7-wtFupkdPIKKyf4xkcq~asHbY8uENOkLHK4ov5bDFbOs6hqe6-yiJEnrX-GlT0CA06T3vUQvLLIj0JCNhqhyX4W5kU8B8qly15GOT~84R1dXG3WVzSXX2nZyKGq5RYLQWVqpVak7GkfMJp1MHMNHupoO3urVKZuJ6Fb8U61WOLGETuOEzXwLwgVOsmu~xp-BOzmR5qSVfzRPH~Ha2dcQZ2NmJ5K7wsBPPM6G~tFdxSUxTKTmeNC-i305~P4d2v55DX1qKilKzmGuMLo-0yNVfuo2ZA__&Key-Pair-Id=APKAJ67PQLWGCSP66DGA": dial tcp 18.65.229.125:443: i/o timeout

I am not knowledgeable in nextflow yet, so I was wondering if I can somehow run the pipeline inside the container I have (2.1.1) to see if it recognizes the modules I already have downloaded.

azmigueldario avatar Dec 02 '22 19:12 azmigueldario

@azmigueldario let's see if we can get this figured out! I think our issue might be singularity within singularity here.

By chance can you install Bactopia through Conda? We will not use Conda, but use it to let Nextflow handle the job submissions to your SLURM cluster.

Here's what I'm thinking:

# Head Node
bactopia -profile test,slurm \
    --slurm_opts="--account=XXXXXXXXX" \
    --slurm_queue "YOUR_QUEUE_NAME"

# Nextflow then submits jobs to the SLURM cluster

If you are interested, I think its worth considering the creation of a profile config file. Here's an example of one I use for a cluster here in Wyoming: https://github.com/bactopia/bactopia/blob/master/conf/profiles/arcc.config

This allows me to just add -profile arcc and Nextflow handles all the job submissions to the cluster.

I tried to follow this suggestion so I could easily implement bacteria to slurm on my institution's HPC. Using a Conda install of Bacteria 2.2.0, my question is where do I put the config file so that bactopia knows where to find it? Is the program looking at the file name or the name of the profile defined within the config when you say to add -profile arcc? The error I get with the following is:

bactopia
--samples FOFN_test.tsv
--datasets /ceph/db/bactopia_2.0
--maxcpus 4
--max_time 480
--outdir bactopia_output
-qs 2
-profile ctmr.config
-resume

N E X T F L O W  ~  version 22.10.6
Unknown configuration profile: 'ctmr'

I switched to using the --nfconfig option and that seems to have worked. I'm just curious about the operations using the -profile flag as well. Thanks!

shigdon avatar May 09 '23 15:05 shigdon

Hi @azmigueldario

I'm cleaning up old issues and since this is related to v2, I'm going to go ahead and close this with the recommendation to give v3 a try.

Please reach out if you have any questions or issues!

Cheers, Robert

rpetit3 avatar Mar 25 '24 20:03 rpetit3