dvc.org start: Pipelines Trail

This is the third step in GS restructuring as we discussed in #2496 (may be closed by addressing this one).

See https://github.com/iterative/dvc.org/issues/2496#issuecomment-847598646, https://github.com/iterative/dvc.org/issues/2496#issuecomment-857021652, https://github.com/iterative/dvc.org/issues/2496#issuecomment-857080772

It will introduce creating pipelines, adding stages and running them with dvc repro.

Create a pipeline

Why do we use pipelines in DVC?
What are dependencies

Add a stage

Introduce dvc stage add

Edit a stage

Introduce editing dvc.yaml
Mention dvc stage add --force?

Run the pipeline

Add another stage
Introduce dvc repro
Update an intermediate stage's dependency
Rerun the pipeline

Visualize the pipeline

List the stages
Show the DAG

Removing Stages

Introduce dvc remove

@shcheklein @jorgeorpinel @dberenbaum

Sep 27 '21 11:09 iesahin

I think we could probably skip "Removing stages," especially if we introduce editing dvc.yaml.

Sep 27 '21 19:09 dberenbaum

Agreed with Dave. Overall - get started should not be a comprehensive overview. It should be a quick happy path that presents most important functionality and the value as fast as possible. Everything else comes secondary to that.

In this case it would be nice to start with dvc stage add, explain dvc.yaml, almost immediately (I would not even do subtitles for now) dvc repro or dvc exp run (exp run is probably even better). Then mention that pipelines could be advanced (templates), show pipeline.

That's pretty much it to be honest. Do we need two subsections for this - I don't know.

Ideally we would rely on one of the existing projects. Maybe the example-get-started one since it makes at least some sense to use pipelines there.

Oct 12 '21 21:10 shcheklein

Ideally we would rely on one of the existing projects. Maybe the example-get-started one since it makes at least some sense to use pipelines there.

I can use example-get-started for this, but example-dvc-experiments also has a 2 stage pipeline, starting from extract (un-tar) and training with train.py. This one is simpler. example-get-started is a bit more complex.

Oct 13 '21 11:10 iesahin

also has a 2 stage pipeline, starting from extract (un-tar)

this is an ungly, unfortunate hack that we need to remove eventually :) it's very sad that we have it now in the project. It's not sustainable and not how DVC should be used.

Oct 14 '21 01:10 shcheklein

The fact that we had to hack may be a bit ugly but telling the pipelines without resorting to Python or code seems like an alternative to me. The user may have a bit difficulty to bridge the gap between usual commands and an ML project, but the basic mechanism might be told in a simpler way.

Anyway, no strong opinions here, I'll proceed with example-get-started.

Oct 18 '21 05:10 iesahin

We can probably repurpose the relevant info here for https://github.com/iterative/dvc.org/issues/2883 instead (i.e. close this issue) and leave https://dvc.org/doc/start/data-pipelines as-is. Or are there still major issues with that page @iesahin ?

Mar 30 '22 08:03 jorgeorpinel

Guys do we still want a separate pipelines trail? Pipelining info is inside https://dvc.org/doc/start/data-management right now. I would personally like to see a separate one but I remember there were opinions agains that. I would put Experiments first, then Data Management, then Pipelines. WDYT @iesahin @dberenbaum @shcheklein ? Thanks

Jun 20 '22 23:06 jorgeorpinel