Implement wall clock time resource requests in Toil and expose to CWL and WDL jobs
Dear developers, I am using Toil for running a pipeline in a HPC environment that uses Slurm for job scheduling.
Certain jobs of my pipeline are shorter than the others and it would be help me a lot if there was an option for specifying walltime for the job. I was not able to find this in the docs.
Is there a way how to tell the scheduler how long a job is supposed to be running? Any other way than using $TOIL_SLURM_ARGS?
Thanks, Daniel
┆Issue is synchronized with this Jira Story ┆Issue Number: TOIL-539
@ielis Unfortunately, that's a scheduler specific param and there isn't a second way of passing it through at the moment. We welcome contributions though if this is something you'd like to see.
Hi @DailyDreaming , thanks for letting me know what the status is. I might look into that and I'll create a PR if I am able to implement the feature reasonably well.
All the best!
This is needed for a CWL v1.1 feature: https://www.commonwl.org/v1.1/CommandLineTool.html#ToolTimeLimit
I don't think this would be the right way to implement CWL's ToolTimeLimit; that option specifies:
The execution duration excludes external operations, such as staging of files, pulling a docker image etc, and only counts wall-time for the execution of the command line itself.
Since we do all that staging and pulling inside the Toil job, we can't pass the tool's limit through to the batch system and apply it to the whole Toil job. We'd have to time the tool ourselves after doing the setup.
Kishwar in our lab has been complaining that we should use wall-clock time limits on our Kubernetes cluster; if Toil workflows could carry wall clock time information that would make it easier for us to actually do that.
@adamnovak the pulling of docker containers could be moved outside of the CWL step invocations. There shouldn't be any staging for typical slurm clusters using a shared filesystem.
Also you could add some slack time to the time limit to account for job setup. The intention of that language is to avoid penalizing the user for infrastructure actions that they can't control directly.
We could definitely pass it through or pass it through plus some slack, as an initial implementation.
Maybe it's time to implement time requests in Toil's backend.
This might block #4686.