Submit tasks as job arrays and fix RNAs in distill summaries
Changes
- Added
array_size = 10parameter to nextflow.config andarrayto conf/base.config for more efficient cluster execution. - Fix inclusion of rRNA, tRNA, and quast summaries to genome_stats.tsv and metabolism_summary.xlsx in bin/distill.py script.
- Refactor channel usage (
Channeltochannel) for consistency across workflows and improve readability + usage of implicit variable within closures (e.g.it.nametoit -> it.name)
Computing environment and command
- nextflow version 25.10.0.10289
- openjdk 22.0.1-internal 2024-04-16
- singularity 3.8.5
- slurm
- x86_64 GNU/Linux
nextflow run tpall/DRAM -r dev --input_fasta ./DRAM/input_fasta --outdir ./DRAM/call-annotate-distill --threads 8 --summarize --qc --use_kofam --use_dbcan --use_merops --use_viral --use_methyl --use_sulfur -profile singularity --slurm --partition main -with-report -with-trace -with-timeline --array_size 10 --queue_size 10 -resume --annotate
Hey @tpall Thanks for this. The job array addition it nice. There is a larger planned update to batch a lot of the inputs into singular jobs to reduce the burden on the queue, since running DRAM with lots of inputs can overwhelm a SLURM scheduler, but adding in job arrays, which weren't supported on the version of Nextflow we initially developed DRAM2 on, but we recently moved to >=24 (we should lock in >=24.04.0 since there are early 24.* prereleases out there if we add in job arrays). I will have to do some testing on utilizing job arrays and their implication. Because from my initial testing it seems like it stops the next stop from proceeding until their are enough inputs to fill an array. Which might be ok. But if we are going to be doing batching anyway, it might not be that important and not worth it.
Also thanks for some of the other QoL updates like updating some of the syntax to DSL2 (Channel -> channel, etc.).
I will have to more fully review the code, which I can get to in a couple weeks. I have deadline for next week, and probably won't be able to review much before then.
But I will leave just a couple of quick thoughts.
Thanks again