dvc.org icon indicating copy to clipboard operation
dvc.org copied to clipboard

ref: document `dvc queue` and `dvc exp` task-queuing changes

Open pmrowla opened this issue 3 years ago • 10 comments

Docs meta-issue for https://github.com/iterative/dvc/issues/7592

  • [x] Initial documentation (command refs) will be driven by the core team.
  • [x] Update existing exp commands/flags that get aliased/deprecated or otherwise chaged.
  • [ ] User guides/ other docs should be updated, but existing functionality is being preserved (so there is no special rush on this rn).

pmrowla avatar Jun 14 '22 12:06 pmrowla

Thanks @pmrowla ! Can you link to any relevant existing materials such as wiki, internal docs, etc? Thanks

jorgeorpinel avatar Jun 16 '22 06:06 jorgeorpinel

There is no existing wiki/documentation. Internal-only proposal/outline for the new CLI can be found in notion: https://www.notion.so/iterative/Queueing-Managing-Experiment-Execution-bb07bf856cd242bd98a2c87cfc6e75d7#3f91a12217f04550a7c95b4a4fe252c1 (final CLI does not completely match the initial proposal)

pmrowla avatar Jun 16 '22 07:06 pmrowla

I think we can at least start with the current help output to give an overview:

$ dvc queue --help
usage: dvc queue [-h] [-q | -v] {start,stop,status,logs,remove,kill} ...

Commands to manage experiments queue.
Documentation: <https://man.dvc.org/queue>

positional arguments:
  {start,stop,status,logs,remove,kill}
                        Use `dvc queue CMD --help` to display command-specific help.
    start               Start experiments queue workers.
    stop                Stop experiments queue workers.
    status              List the status of the queue tasks and workers.
    logs                Show output logs for a task in the experiments queue.
    remove              Remove tasks in experiments queue.
    kill                Kill tasks in experiments queue.

optional arguments:
  -h, --help            show this help message and exit
  -q, --quiet           Be quiet.
  -v, --verbose         Be verbose.

dberenbaum avatar Jun 16 '22 17:06 dberenbaum

ETA for release? Thanks

jorgeorpinel avatar Jun 16 '22 22:06 jorgeorpinel

We are hoping to have it released by the end of the month @jorgeorpinel

dberenbaum avatar Jun 17 '22 15:06 dberenbaum

I think it would be ideal to have drafts for the references sooner than later. Those reviews may imply basic QA to improve the release (preferably in several smaller PRs).

p.s. is there a dev branch to play with this?

jorgeorpinel avatar Jun 27 '22 19:06 jorgeorpinel

p.s. is there a dev branch to play with this?

Yes, it's on dvc-task-dev.

dberenbaum avatar Jun 27 '22 19:06 dberenbaum

A few questions to consider for now.

usage: dvc queue

Shouldn't this be inside dvc exp? I get that it's too many subcommands though but maybe dvc exp-queue then?

start               Start experiments queue workers.

I understand that by default this goes into a background process. Is there a way to start it in the foreground? Say queue start --attached

Wait. What happened to queue attach? 🙂

stop                Stop experiments queue workers.

Does this kill the running epxs? Do they then become failed? What about the rest of the queue, does it remain in Queued state? When you restart, where does it start from?

remove              Remove tasks in experiments queue.

Please confirm the difference or overlap between this and exp remove.

jorgeorpinel avatar Jun 27 '22 20:06 jorgeorpinel

Shouldn't this be inside dvc exp? I get that it's too many subcommands though but maybe dvc exp-queue then?

This was discussed in the initial planning, in the end we went with dvc queue to avoid having too many nested commands.

I understand that by default this goes into a background process. Is there a way to start it in the foreground? Say queue start --attached

There is currently no flag for this. In the meantime you can use exp run --run-all to get the same behavior, but eventually this can be folded into a flag for queue start

Wait. What happened to queue attach? 🙂

It was folded into queue logs -f/--follow

Does this kill the running epxs? Do they then become failed? What about the rest of the queue, does it remain in Queued state? When you restart, where does it start from?

By default queue stop will finish any currently executing experiments and then stops the queue worker. Any remaining queued experiments stay in the queue. queue stop --kill will kill any currently running experiments and stop the queue processing immediately. (Killed experiments will be marked as failed unless the the user's pipeline/stage command has special handling for sigkill/sigterm which is unlikely in typical cases)

Please confirm the difference or overlap between this and exp remove.

The intended behavior is that be no overlap between the two commands.

  • queue remove will specifically apply to queued experiments and queue artifacts (i.e. it removes queue entries and any saved logs for old queue entries that can be accessed with queue logs)
  • exp remove will specifically apply to successful experiments (i.e. it removes DVC exp git refs and any associated DVC cache data for those exp refs)

The existing --queue related flags for exp remove and gc will be deprecated and eventually removed to make this separation clearer.

pmrowla avatar Jun 28 '22 07:06 pmrowla

I understand that by default this goes into a background process. Is there a way to start it in the foreground? Say queue start --attached

There is currently no flag for this. In the meantime you can use exp run --run-all to get the same behavior, but eventually this can be folded into a flag for queue start

queue logs -f/--follow without a task should automatically follow the currently running experiment in the future.

Please confirm the difference or overlap between this and exp remove.

When developing the queue related feature. What I realized is that queue tasks and experiment are different aspects. Although they are strongly related but are still different. tasks are more focused on execution while experiments are more focused on the result. One checkpoint task can generate dozens of experiments, and experiments can be run without using a queue worker. The difference also comes into the status/show table, we can delete a succeeded task msg and leave the experiment result (revision) untouched.

karajan1001 avatar Jun 28 '22 11:06 karajan1001

Are there guides that still need to be updated here?

dberenbaum avatar Oct 17 '22 18:10 dberenbaum

User guides were updated. If there's anywhere we find the queue info missing, please open a new issue.

dberenbaum avatar Oct 18 '22 12:10 dberenbaum