aztk icon indicating copy to clipboard operation
aztk copied to clipboard

Delete task on completion

Open willferreira opened this issue 8 years ago • 6 comments

I have to manually delete a task via the portal each time it completes (successfully or otherwise). Is there an option to delete a task automatically on termination? Otherwise, if I submit a task again, I get the following error message:

-------------------------------------------
The specified task already exists.
RequestId:<whatever>
Time:<whatever>
-------------------------------------------

willferreira avatar Dec 18 '17 10:12 willferreira

The reason for this is because we require each "task" or --name to be unique. I think we should solve this by allowing users to submit the same values for paramter --name. Maybe append a datetime string to the end of the 'task name' or something like that.

jiata avatar Dec 18 '17 19:12 jiata

I think deleting a task is an interesting possibility. Keeping the task around isn't strictly necessary since the main reason we would reference the task (spark app) again is to retrieve logs for it. And we can get logs directly from storage -- the task plays no role in that. Issues could start to prop up in error cases, though.

In the event that the task fails before the Spark output logs are created, deleting the task would make debugging much harder. And if the logs don't exist, there is no way to tell the difference between a task that failed and a task that never existed. So, if you wanted to retrieve status for your Spark app, you couldn't tell the difference between a Spark app that has completed (and been deleted) and one that was never created at all.

As far as appending timestamps goes, it's unclear how we would reference logs in this model. We could write logs to storage in this format:

  • blob name: <job_id>
  • blob path: <app-name>/<timestamp>/output.log

But to get logs for any given app name, we would have to prompt users to select which timestamp they want. This is fine for the CLI, but for the SDK that makes a lot less sense.

jafreck avatar Dec 18 '17 19:12 jafreck

I don't think the task should be deleted automatically on termination, since, as you say, it might not yet have written logs before failing, which would make it harder to debug. Perhaps a command like:

aztk spark task delete --id <task name>

is sufficient? It would save me from hunting through the portal for the task to delete it (unless there is some quick way to find it in the portal?)

willferreira avatar Dec 18 '17 23:12 willferreira

We can explore adding something like that, but I'm not sure what the timeline for that would be.

The current recommendation is just to have different task name each time you submit a spark application.

In the meantime, if you find yourself trying to delete tasks regularly and don't want to use the portal, BatchLabs is a good application to look into. Should make interacting with the Batch service easier in the meantime.

jafreck avatar Dec 20 '17 18:12 jafreck

Understood. But changing the task name for each run still leaves completed tasks lying around (that presumably ought to be deleted for good housekeeping), and means that a code, run, observe results, code, ... cycle means incrementing the task name/id. I'll check out BatchLabs in the meantime.

willferreira avatar Dec 20 '17 20:12 willferreira

I'm reopening because I think this issue deserves a little more discussion/thought around what the best approach is.

There are definitely issues with the way tasks exist today, so hopefully we can find a better solution that will allow:

  1. Spark app submission with the same name
  2. Spark app deletion
  3. Spark app listing (with status)

jafreck avatar Dec 20 '17 21:12 jafreck