Linear or constant backoff time
Feature description
Add the ability to configure the backoff time between re-attempts.
- Constant backoff time. Job re-attempt is triggered every [configured duration]. i.e. it works as an interval.
- Linear backoff time. Job re-attempt delay grows between attempts in linear scale.
Motivating example
The exponential curve on backoff-time is not well suited for faster rhythm tasks.
Breaking changes
None, additional feature.
Supporting development
- [x] am interested in building this feature myself
- [x] am interested in collaborating on building this feature
- [x] am willing to help testing this feature before it's released
- [ ] am willing to write a test-driven test suite for this feature (before it exists)
- [ ] am a Graphile sponsor ❤️
- [ ] have an active support or consultancy contract with Graphile
Could you give examples of the "faster rhythm tasks" to which you allude?
In my case, i was trying to implement some kind of printing queue. Incase of device malfunction (jam, run out of papper/ink) this exponential retry time gets out of the reasonable time quite fast. For example if it takes first 5min to fix the malfunction, you still need to wait 5min when next re-attemp will occur.
The best way to handle this currently is to catch the error within the job, queue another job, and exit successfully.
It feels a bit awkward to move task re-scheduling logic in to the application code. Therefore I'm still looking a solution to execute re-attempts faster (than now()+exp(attempts)) from worker side.
What do you think if maximum delay/backoff time between attempts could be configured per job same as max attempts?
Current implementation already limits it to exp(10) seconds which is a bit over six hours.
https://github.com/graphile/worker/blob/e3176eab42ada8f4f3718192bada776c22946583/sql/000003.sql#L136
Limit could be also configurable via job options.
run_at = greatest(now()) + (least(exp(least(attempts,10)), max_backoff_delay)::text || ' seconds')::interval
Every column we add to jobs has a performance overhead, and Worker is performance focussed. Therefore I like to ensure that a significant number of users would need this functionality before considering adding that kind of thing. So far there has not been much demand for this functionality, and I am also not keen on it myself.