looper icon indicating copy to clipboard operation
looper copied to clipboard

.sub file environment variables do not have a value

Open aaron-gu opened this issue 5 years ago • 7 comments

I saw that divvy is being used to generate the .sub files for a looper job submission. However, I could not easily find anywhere in the vignettes describing how to set the environment variables such as {MEM} and {CORES}. It would be nice if these variables were set to a default value without any configuration, or if there was extra description in the vignettes on how to set them.

aaron-gu avatar May 27 '20 17:05 aaron-gu

how are you trying to use them? using divvy you set them with -c mem=8000 cores=1, for example

http://divvy.databio.org/en/latest/cli/

nsheff avatar May 27 '20 17:05 nsheff

I am just using looper run project_config.yaml

aaron-gu avatar May 27 '20 17:05 aaron-gu

ok -- looper should default to using the localhost template which doesn't have those variables... so, that doesn't make sense to me... can you be more specific about what you're trying to do? also, try the above

nsheff avatar May 27 '20 17:05 nsheff

I set up a PEP project for my bedshift code to generate the 100 samples for every parameter combination. I followed the PEP and looper tutorials pretty smoothly until it came to running the looper job, where I got the error sbatch: error: invalid memory constraint {MEM}

Also, I'm not sure how to run the divvy command with looper, since there are many .sub files generated.

aaron-gu avatar May 27 '20 18:05 aaron-gu

Here's an example of a .sub file:

#!/bin/bash
#SBATCH --job-name='bedshift_run_add1'
#SBATCH --output='looper_output/submission/bedshift_run_add1.log'
#SBATCH --mem='{MEM}'
#SBATCH --cpus-per-task='{CORES}'
#SBATCH --time='{TIME}'
#SBATCH --partition='standard'
#SBATCH -m block
#SBATCH --ntasks=1
#SBATCH --open-mode=append

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

cmd="/sfs/qumulo/qhome/ag5ym/databio/bedshift_paper/pep_project/bedshift.sh /project/shefflab/resources/regions/LOLACore/hg19/encode_tfbs/regions/wgEncodeAwgTfbsUwHek293CtcfUniPk.narrowPeak 0.1 0.0 0.0 100 "

y=`echo "$cmd" | sed -e 's/^/srun /'`
eval "$y"

aaron-gu avatar May 27 '20 18:05 aaron-gu

ah, I see. you're on rivanna -- so we set the looper default to submit jobs to slurm.

there's lots of things you can do.

  1. try using looper --package to run using a local template, to test. divvy list shows available templates
  2. if you want to use the slurm template, then of course you must provide all the variables for that template. you can do it like I mentioned above: looper run -c cores=1 mem=4000
  3. really, you should provide in your pipeline interface these variables. you do this using the compute section. http://looper.databio.org/en/latest/pipeline-interface-specification/#compute

you could just add this to your interface:

compute:
  mem: 4000
  cores: 1

nsheff avatar May 27 '20 18:05 nsheff

Got it, thanks! Is there a way to make it easier to find that section of documentation? The order I went through the docs was Introduction > Defining a Project > Running on a cluster, and then I followed the links to divvy to try to solve the issue.

aaron-gu avatar May 27 '20 18:05 aaron-gu

I added a bit of clarification on this to the docs for the upcoming release.

donaldcampbelljr avatar Jun 06 '24 13:06 donaldcampbelljr

Solved with v1.8.1 Release.

donaldcampbelljr avatar Jun 06 '24 14:06 donaldcampbelljr