tpalashki
tpalashki
If a job stops executing, its last termination status will remain unchanged and equal to the status of its last execution (let's assume it was a User Error). Consequently, the...
Good point. This will be helpful, but it will not cover all use cases. For example, jobs that execute rarely (once a month) or jobs that never execute (with schedule...
Good idea. We can introduce a configuration similar to Prometheus (`resolve_timeout`) to specify the time after which the alert is automatically resolved.
It is not possible to root cause this particular error. It happens very rarely (I have seen it only once) and searching for the above error yields no results. However,...