[Feature]: Max duration for runtime
Problem
Along with the current max duration logic, which is good for saving cost, it would be great to have an option where the duration only counts "running" time. For example I need it to run trainings for a very specific time window, some of that is taken up by provisioning.
Solution
No response
Workaround
No response
Would you like to help us implement this feature by sending a PR?
No
@diagonalge, hi
I need it to run trainings for a very specific time window
Can you elaborate why you'd need that and how specific the window should be (e.g. within seconds/minutes of the specified duration)? Typically provisioning time takes a few minutes, which is a small portion of a total runtime, and that should be fine for minute-level precision. If you need second-level precision, you should stop the training from within the run.
I am trying to run organic jobs from https://gradients.io The hours_to_complete is a very important factor there. Provisioning takes long in some cases as the docker image is ~30Gigs and I've encountered slower internet on some pods than others. So its all runpod.
I have managed it in the script for now, but would be a great dstack feature to allow customizing your run.
Got it. Now looking at the code I recall that max_duration applies to the job execution time, so that already should be what you want. The job is stopped on max_duration by the runner that runs the job: https://github.com/dstackai/dstack/blob/f21a9896144893d34d347748d789af18e36a0664/runner/internal/executor/executor.go#L258
That should probably be clarified in the docs as well.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.