Add timeout to RunSettings
Description
A timeout functionality should be added to Experiment.start call, and limit the max execution time of a manifest. The timeout should be specified in seconds, possibly as an int value of the block argument, and specifying True should result in uncapped execution time, whereas False would retain its current meaning, a synonym being block=0. Values < 0 should not be accepted.
Justification
The blocking start call is normally used to run applications and wait for their outcome before moving to the next script line. Sometimes, though, applications can get stuck, or the workload manager could take too long to respond to a batch submission. This results in wasted compute hours for the user, or, sometimes, CI/CD runs timing out without exporting results.
Implementation Strategy
- [ ] Make
blockaccept bothboolandintvalues - [ ] Add a test which checks that a
Modelis killed when time expires - [ ] Add a (very large but not infinite) timeout to current blocking calls which could hang in tests (such as those interacting with the WLM)
- [ ] Document the new API
Matt E: Maybe an alternative consider RunSettings should have this.
Matt D: Have block be an integer so we don't have two optional parameters
@al-rigazzi Rework ticket to include discussion of ideal solution.
@mellis13 RunSettings could in principle have a timeout parameter, but I'm afraid it might confuse users who could intend it as a synonym of walltime or time in BatchSettings-derived classes.