worker icon indicating copy to clipboard operation
worker copied to clipboard

Adds logging when jobs fail with number of attempts and max attempts

Open nodesocket opened this issue 4 years ago • 1 comments

Description

Adds logging when jobs fail with number of attempts and max attempts.

attempt ($attempt of $max_attempts)

Performance impact

Unknown

Security impact

Unknown

Checklist

  • [X] My code matches the project's code style and yarn lint:fix passes.
  • [ ] I've added tests for the new feature, and yarn test passes.
  • [ ] I have detailed the new feature in the relevant documentation.
  • [ ] I have added this feature to 'Pending' in the RELEASE_NOTES.md file (if one exists).
  • [ ] If this is a breaking change I've explained why.

nodesocket avatar Nov 15 '21 22:11 nodesocket

I could use some help calculating the next fuzz delay though.

From the README, the attempts do backoff along these lines:

select attempt, exp(least(10, attempt)) * interval '1 second' as delay, sum(exp(least(10, attempt)) * interval '1 second') over (order by attempt asc) total_delay from generate_series(1, 24) as attempt;

So we can work out the delay until the next job as:

function delayInMs(attempt: number): number {
  return Math.exp(Math.min(10, attempt)) * 1000;
}

To make this fuzzy we just want the nearest human-readable size. I'd do something like:

const MILLISECOND = 1;
const SECOND = 1000 * MILLISECOND;
const MINUTE = 60 * SECOND;
const HOUR = 60 * MINUTE;
// We know the maximum attempt delay is about 6 hours, so no need to go any further.

function fuzzyDelay(attempt: number): string {
  const ms = delayInMs(attempt);
  if (ms > 1.5 * HOUR) {
    return `~${(ms/HOUR).toFixed(1)} hours`;
  } else if (ms >= 1.5 * MINUTE) {
    return `~${(ms/MINUTE).toFixed(1)} minutes`;
  } else {
    return `~${Math.round(ms/SECOND)} seconds`;
  }
}

This was all written direct into GitHub so you'll need to check it for errors!

benjie avatar Nov 16 '21 17:11 benjie

I'm a bit confused by my quote message above because it doesn't seem to be something you wrote here that I was replying to. Perhaps it was on chat or somewhere, but I failed to track it down. Either way, it looks like we won't be addressing that, so I've moved the error message about a bit and merged in the latest. This is good to go now; logging looks like:

[worker(worker-c3465d98e066eea4d0)] DEBUG: Found task 15 (charm)
[worker(worker-c3465d98e066eea4d0)] ERROR: Failed task 15 (charm, 0.32ms, attempt 1 of 25) with error 'Throwing error because attempt 1 < 3':
  Error: Throwing error because attempt 1 < 3
      at module.exports (.../perfTest/tasks/charm.js:3:11)
      at doNext (.../src/worker.ts:206:24)
      at processTicksAndRejections (node:internal/process/task_queues:95:5)
[worker(worker-6f2d6ccd8f1b45bd1c)] DEBUG: Found task 15 (charm)
[worker(worker-6f2d6ccd8f1b45bd1c)] ERROR: Failed task 15 (charm, 0.15ms, attempt 2 of 25) with error 'Throwing error because attempt 2 < 3':
  Error: Throwing error because attempt 2 < 3
      at module.exports (.../perfTest/tasks/charm.js:3:11)
      at doNext (.../src/worker.ts:206:24)
      at processTicksAndRejections (node:internal/process/task_queues:95:5)
[worker(worker-b04aa658a0c2821f4c)] DEBUG: Found task 15 (charm)
Third time's the charm!
[worker(worker-b04aa658a0c2821f4c)] INFO: Completed task 15 (charm, 0.17ms, attempt 3 of 25) with success
[worker(worker-c3465d98e066eea4d0)] DEBUG: Found task 17 (wait)
[worker(worker-c3465d98e066eea4d0)] INFO: Completed task 17 (wait, 30002.08ms) with success

benjie avatar Oct 24 '23 10:10 benjie

Ah; the quoted text came from https://github.com/graphile/worker/issues/235#issuecomment-969388627

benjie avatar Oct 24 '23 10:10 benjie