[Feature]: Num retries
Problem
Right now the retry_policy works with a time window. It'd be great to have a max num of retries it does within that window or without.
Solution
No response
Workaround
No response
The retry duration is calculated as a run age for no-capacity and as a time passed since the last interruption and error for interruption and error events. @diagonalge does it cover your use case? If not, please expand.
Updated the docs to clarify retry duration: https://github.com/dstackai/dstack/commit/70271963760f42fe21026eaef98b0812b66aabed
@r4victor Thanks, its a bit ambiguous still. From what I get, in case of 4h retry window:
- In case of no capacity, it will keep trying for 4h?
- In case of errors, how does it work exactly sorry thats not clear.
In case of no capacity, it will keep trying for 4h?
Yes, counting from the moment you submitted the run.
In case of errors, how does it work exactly sorry thats not clear.
dstack will try for 4h since the latest failure.
This issue is stale because it has been open for 30 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.