worker (Docs) Definitions of runner, worker, pool, task, job etc.

Feature description

The documentation should contain definitions of these terms.

Motivating example

I'm unable to figure out what the distinction between a runner and a worker is.

Breaking changes

None

Supporting development

I [tick all that apply]:

[ ] am interested in building this feature myself
[ ] am interested in collaborating on building this feature
[ ] am willing to help testing this feature before it's released
[ ] am willing to write a test-driven test suite for this feature (before it exists)
[ ] am a Graphile sponsor ❤️
[ ] have an active support or consultancy contract with Graphile

Nov 14 '21 02:11 aravindet

That'd be a great addition; fancy raising a PR? Cliffnotes:

Task: Something that can be executed, such as "send_email". Think of it like a function.

Job: a record in the jobs table which represents a single thing to do: which Task to execute and what parameters ("payload") to execute it with. Think of it like an object.

Worker: a process that is provided a list of Tasks it is capable of executing and then looks for a single Job to execute that specifies one of these Tasks. Executes the Job, reporting success or failure back to the database. Then finds the next Job and continues this process.

WorkerPool: manages a collection of Workers such that multiple Jobs can be executed in parallel (within the constraints of the Node.js event loop), is responsible for listening for and dispatching new job events.

Cron: <see README>

Runner: manages a Cron instance and a WorkerPool instance - is the main way that users would execute Graphile Worker programatically. (Really small piece of code: https://github.com/graphile/worker/blob/a2038c6b130c8432cfef9a61c256478a8d608509/src/runner.ts#L68-L114)

Nov 15 '21 11:11 benjie

Thanks @benjie . I'll give it a shot. I still need a clarification though. When you say each Worker is a process, I am assuming this means a separate OS process created and managed (for example) with the child_process API.

I am using graphile-worker programmatically. When I call the run() function, I assume it:

creates a WorkerPool instance in the original Node.js process (where I called run),
this WorkerPool starts up N separate Worker processes, where N is the concurrency value I pass to run

Is my understanding correct?

Nov 16 '21 07:11 aravindet

When you say each Worker is a process, I am assuming this means a separate OS process created and managed (for example) with the child_process API.

Ah sorry for the imprecise wording. No, it's simply a JS routine inside the Node.js main thread - graphile-worker is entirely single-threaded

Your understanding is correct, except that the "worker processes" (routines) are also within the original Node.js "OS process". It doesn't make sense for the workers themselves to run in separate threads because they're incredibly light, however it may make sense for your task functions to defer to worker threads/child_process if they're expensive to run within Node.js.

Nov 16 '21 17:11 benjie

if they're expensive to run within Node.js

I'd actually expand on that. There's one more issue - isolation. If you have a long running mission critical process like graphile worker, you want some kind of isolation so that one task crashing badly (i.e. with an uncaught exception) doesn't cause the whole thing to crash. Or if the task is complex, it might leak some resources, to keep the programming model simple you might want to run it in a separate process and then just nuke it after it's done. Otherwise you have to be extremely meticulous about handling that. I'm currently writing this as I'm debugging a problem related to this in my application.

Nov 08 '22 10:11 wokalski