worker icon indicating copy to clipboard operation
worker copied to clipboard

Linear or constant backoff time

Open nnaku opened this issue 4 years ago • 3 comments

Feature description

Add the ability to configure the backoff time between re-attempts.

  • Constant backoff time. Job re-attempt is triggered every [configured duration]. i.e. it works as an interval.
  • Linear backoff time. Job re-attempt delay grows between attempts in linear scale.

Motivating example

The exponential curve on backoff-time is not well suited for faster rhythm tasks.

Breaking changes

None, additional feature.

Supporting development

  • [x] am interested in building this feature myself
  • [x] am interested in collaborating on building this feature
  • [x] am willing to help testing this feature before it's released
  • [ ] am willing to write a test-driven test suite for this feature (before it exists)
  • [ ] am a Graphile sponsor ❤️
  • [ ] have an active support or consultancy contract with Graphile

nnaku avatar Feb 04 '22 12:02 nnaku

Could you give examples of the "faster rhythm tasks" to which you allude?

benjie avatar Feb 04 '22 13:02 benjie

In my case, i was trying to implement some kind of printing queue. Incase of device malfunction (jam, run out of papper/ink) this exponential retry time gets out of the reasonable time quite fast. For example if it takes first 5min to fix the malfunction, you still need to wait 5min when next re-attemp will occur.

nnaku avatar Feb 04 '22 15:02 nnaku

The best way to handle this currently is to catch the error within the job, queue another job, and exit successfully.

benjie avatar Feb 15 '22 21:02 benjie

It feels a bit awkward to move task re-scheduling logic in to the application code. Therefore I'm still looking a solution to execute re-attempts faster (than now()+exp(attempts)) from worker side.

What do you think if maximum delay/backoff time between attempts could be configured per job same as max attempts?

Current implementation already limits it to exp(10) seconds which is a bit over six hours. https://github.com/graphile/worker/blob/e3176eab42ada8f4f3718192bada776c22946583/sql/000003.sql#L136

Limit could be also configurable via job options. run_at = greatest(now()) + (least(exp(least(attempts,10)), max_backoff_delay)::text || ' seconds')::interval

nnaku avatar Mar 22 '23 09:03 nnaku

Every column we add to jobs has a performance overhead, and Worker is performance focussed. Therefore I like to ensure that a significant number of users would need this functionality before considering adding that kind of thing. So far there has not been much demand for this functionality, and I am also not keen on it myself.

benjie avatar Mar 22 '23 14:03 benjie