documentation Add tips on structuring your pipeline

In particular,

ensure you pull out repetitive bits into their own actions
change expensive long-running loops so they can be run as individual actions with different parameters

The former saves you time by allowing you to use previously-computed outputs for dependent actions.

The latter gives you easy parallelisation for your loops

Nov 30 '20 08:11 sebbacon

By extracting analytical choices out of an R or stata script and into the pipeline, we're meddling with users' code writing and organisation preferences. Not necessarily in a bad way, but something to be aware of.

For example for loops, it might be annoying if the parameters you want to iterate over are derived from the analysis itself.

In general, it would be useful to know more about how to "program" with yaml (or at least make it feel that way e.g. by passing vectors / lists outputted from a script into a single action). Evenutally this info will live in documentation but I need to understand more about it first! This is related to questions @angelwong121 has had previously.

Nov 30 '20 15:11 wjchulme

Yes.

We should explain it so that they can make the choices. There are benefits to breaking out in loops but also drawbacks.
Eventually our YAML config format will grow the ability to repeat things with a range of parameters but not yet so we need an upgrade path
Generating YAML from code is one such strategy e.g. this script which only generates part of the YAML for copy-and-pasting

Nov 30 '20 16:11 sebbacon

Cool. Of course by "meddling" I really meant "providing a larger menu of analytical options"

3 is a simple approach that's easy to implement and document

Nov 30 '20 17:11 wjchulme

Just to note there's a ticket about this: https://github.com/opensafely/job-runner/issues/28

Nov 30 '20 19:11 sebbacon