sdk Add `dependencies` and `data` properties to integration steps

We should be continuously driving towards improving our dependency execution graph. We use step IDs to define relationships between parent and child steps, but these are proxies to indicate that there was some entity, relationship, or setData created in a previous step that is required for this step to run. We should consider deprecating dependsOn in step definitions and instead use a dependencies object that includes entities, relationships, setData, or stepIds to build a dependency graph.

const step = {
  id: 'step-id',
  name: 'step-name',
  entities: [],
  relationships: [],
  data: [],
  dependencies: {
    entities: [],
    relationships: [],
    data: [],
    // prefer using `entities`, `relationships`, or `data`, but continue to support passing in `steps`.
    stepIds: ['parent-step-id'],
  },
  // deprecate the dependsOn property:
  // dependsOn: ['parent-step-id'],
  executionHandler: stepHandler,
}

Mar 22 '21 16:03 ndowmon

I'm unsure if this will be more clear or not. I'm wondering about a case where we. create entities in 2 different steps. Will both of them need to be created prior to running this step? I would be worried that it would make debugging more difficult.

Mar 22 '21 18:03 mknoedel

Yes, they would both need to run in before this step. As a developer, you'd need to remember that there are two steps creating an entity of the same _type, and add both of the step IDs to this integration. However, if you simply pass the _type that this step requires, the developer doesn't need to know anything about the fact that it's generated in two separate steps.

It would also be great to build some type hinting so that if you defined step.dependencies.entities[0]._type = 'some-type' and step.dependencies.entities[1]._type = 'other-type', and you were to call await jobState.iterateEntities({ _type: 'some-type-not-in-dependencies' }, async () => {});, the context provided in the dependencies could type hint the options of _type ('some-type' | 'other-type') and cause a lint error if you've passed a type that isn't defined in your dependencies. Could do the same thing with getData().

Mar 22 '21 18:03 ndowmon

I think we need to consider why steps are so interdependent, and work toward a more declarative way to express relationships that will allow something to run after the steps have collected entities that will make the relationships (a mini-mapper process 😁).

Mar 22 '21 21:03 aiwilliams

This one can also help us add some great optimization to the SDK, because we would know exactly which entities/relationships will be needed in future steps, and when they are no longer needed. It may even allow us to store entities that are either iterated (jobState.iterateEntities) or found (jobState.findEntity) in memory until dependent steps complete, and then free up that memory. cc @austinkelleher

Mar 23 '21 12:03 ndowmon