nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

LSF executor does not respect LSF_UNIT_FOR_LIMITS in lsf.conf

Open d-callan opened this issue 1 year ago • 7 comments

Bug report

Expected behavior and actual behavior

Jobs submit on an LSF cluster should respect the value for LSF_UNIT_FOR_LIMITS in lsf.conf, per #1124 .. However, running on a cluster where this unit is set to MB, for a task asking for 80 MB, sees a header in .command.run files like the following:

#BSUB -M 81920
#BSUB -R "select[mem>=81920] rusage[mem=80]"

Steps to reproduce the problem

On an LSF cluster with a non-default setting for LSF_UNIT_FOR_LIMITS, i attempted to run an nf-core pipeline..

nextflow run nf-core/metatdenovo -profile singularity,test -outdir out

Program output

The cluster fails to start jobs, saying ive requested more resources than the queue allows.

Environment

  • Nextflow version: Ive tried 23.10.1 and 24.04.3
  • Java version: 11.0.1
  • Operating system: Linux
  • Bash version: GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu)

d-callan avatar Jul 29 '24 13:07 d-callan

possibly crazy question though.. wondering if there is a way i can work around this in the meantime of a fix? im kind of stuck as things are.

d-callan avatar Jul 29 '24 14:07 d-callan

as i investigate more, it seems like this is due to some odd configuration on my cluster. i cant run nextflow directly on the head node, where the correct lsf.conf exists. and for whatever reason, the lsf.conf file on the worker nodes is not consistent w the head node. ive tried to ask the admins about it, and they are.... something less than helpful. i think id like to amend this ticket to a feature request:

to be able to explicitly override this unit

d-callan avatar Jul 29 '24 20:07 d-callan

This LSF config setting is read here: https://github.com/nextflow-io/nextflow/blob/2fb5bc07f2ad1309c9743b8675bb8003892e3eb7/modules/nextflow/src/main/groovy/nextflow/executor/LsfExecutor.groovy#L315-L320

And the memory options are defined here: https://github.com/nextflow-io/nextflow/blob/2fb5bc07f2ad1309c9743b8675bb8003892e3eb7/modules/nextflow/src/main/groovy/nextflow/executor/LsfExecutor.groovy#L92-L103

So you can see how the various config options affect the final submit options. Maybe you can use the executor.perJobMemLimit or executor.perTaskReserve options to get what you need

bentsherman avatar Jul 30 '24 18:07 bentsherman

thanks @bentsherman for the info. i had another thought recently.. what do you think of explicitly adding units to the submission string? so that nextflow produces something like bsub -M 50000KB rather than bsub -M 50000? if doable, that seems like it should make this more robust, make my problem go away, and add clarity without changing existing behavior/ features?

d-callan avatar Aug 09 '24 14:08 d-callan

I didn't realize that was an option. It would make things much simpler. Can a unit be specified for all of those memory settings?

bentsherman avatar Aug 09 '24 15:08 bentsherman

hmm. good question. ive just now gone and tried to ask for an interactive node on my cluster like bsub -M 4GB -R "select[mem>=8GB] rusage[mem=8GB]" -Is bash and nothing screamed at me or caught fire.. so that seems promising.

d-callan avatar Aug 09 '24 16:08 d-callan

Okay I see it is documented here: https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=requirements-resource-requirement-strings#vnmbvn__title__3

Assuming this syntax has been supported for a while, it should be fine for Nextflow to use it. I will draft a PR

bentsherman avatar Aug 09 '24 17:08 bentsherman

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Apr 26 '25 06:04 stale[bot]