(Docs) Definitions of runner, worker, pool, task, job etc.
Feature description
The documentation should contain definitions of these terms.
Motivating example
I'm unable to figure out what the distinction between a runner and a worker is.
Breaking changes
None
Supporting development
I [tick all that apply]:
- [ ] am interested in building this feature myself
- [ ] am interested in collaborating on building this feature
- [ ] am willing to help testing this feature before it's released
- [ ] am willing to write a test-driven test suite for this feature (before it exists)
- [ ] am a Graphile sponsor ❤️
- [ ] have an active support or consultancy contract with Graphile
That'd be a great addition; fancy raising a PR? Cliffnotes:
Task: Something that can be executed, such as "send_email". Think of it like a function.
Job: a record in the jobs table which represents a single thing to do: which Task to execute and what parameters ("payload") to execute it with. Think of it like an object.
Worker: a process that is provided a list of Tasks it is capable of executing and then looks for a single Job to execute that specifies one of these Tasks. Executes the Job, reporting success or failure back to the database. Then finds the next Job and continues this process.
WorkerPool: manages a collection of Workers such that multiple Jobs can be executed in parallel (within the constraints of the Node.js event loop), is responsible for listening for and dispatching new job events.
Cron: <see README>
Runner: manages a Cron instance and a WorkerPool instance - is the main way that users would execute Graphile Worker programatically. (Really small piece of code: https://github.com/graphile/worker/blob/a2038c6b130c8432cfef9a61c256478a8d608509/src/runner.ts#L68-L114)
Thanks @benjie . I'll give it a shot. I still need a clarification though. When you say each Worker is a process, I am assuming this means a separate OS process created and managed (for example) with the child_process API.
I am using graphile-worker programmatically. When I call the run() function, I assume it:
- creates a WorkerPool instance in the original Node.js process (where I called run),
- this WorkerPool starts up N separate Worker processes, where N is the concurrency value I pass to run
Is my understanding correct?
When you say each Worker is a process, I am assuming this means a separate OS process created and managed (for example) with the child_process API.
Ah sorry for the imprecise wording. No, it's simply a JS routine inside the Node.js main thread - graphile-worker is entirely single-threaded
Your understanding is correct, except that the "worker processes" (routines) are also within the original Node.js "OS process". It doesn't make sense for the workers themselves to run in separate threads because they're incredibly light, however it may make sense for your task functions to defer to worker threads/child_process if they're expensive to run within Node.js.
if they're expensive to run within Node.js
I'd actually expand on that. There's one more issue - isolation. If you have a long running mission critical process like graphile worker, you want some kind of isolation so that one task crashing badly (i.e. with an uncaught exception) doesn't cause the whole thing to crash. Or if the task is complex, it might leak some resources, to keep the programming model simple you might want to run it in a separate process and then just nuke it after it's done. Otherwise you have to be extremely meticulous about handling that. I'm currently writing this as I'm debugging a problem related to this in my application.