flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[Core feature] Deleting executions

Open tomasz-sodzawiczny opened this issue 2 years ago • 4 comments

Motivation: Why do you think this is important?

When we process customer data in Flyte we need to have the ability to completely remove the related artifacts from the system - either directly after processing, or later at request of the customer.

It seems execution artifact deletion came up before:

  • a use-case was mentioned slack was removing an execution with accidentally leaked sensitive information in the parameters (thread).
  • https://github.com/flyteorg/flyte/issues/2832 mentions TTL for compliance reasons

Goal: What should the final outcome look like, ideally?

I would love to be able to:

  • delete executions one by one
  • ideally delete groups of executions by something like label or name prefix (e.g. to delete all executions related to a specific customer).

Describe alternatives you've considered

flytectl delete execution seems to be only for terminating executions, I haven't found any other delete option in the UI / APIs.

If the sensitive information is in the raw data storage, I can remove the storage objects directly (but this is cumbersome and prevents you from using primitives).

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

tomasz-sodzawiczny avatar Apr 14 '23 17:04 tomasz-sodzawiczny

Thank you for opening your first issue here! 🛠

welcome[bot] avatar Apr 14 '23 17:04 welcome[bot]

A few users have asked about this in the past but this is not simple to do as part of core flyte. Not least because once this code is in Admin, the chance for accidental good data deletion goes way way up. Currently we have eliminated a whole class of bugs by just not having that code.

Tagging executions is request we've heard a lot more demand for and is something that's on our medium term horizon.

You can easily manually run delete queries against the database of course even now. There's fewer than 15 tables (our recent attempt at cleaning up migrations has a comprehensive list). The ones you'll want to focus on are

  • executions
  • task_executions
  • node_executions
  • execution-events
  • node-execution-events

related https://github.com/flyteorg/flyte/issues/3234

wild-endeavor avatar May 05 '23 22:05 wild-endeavor

Note that in addition to deleting the data in the database, the inputs/outputs associated with the execution (stored in S3/GCS) would also have to be removed.

I am a bit surprised this hasn't come up more often, as the inability to delete data shuts the door to all users who have to respect some sort of data retention policies.

danieldanciu avatar Apr 21 '24 07:04 danieldanciu

@danieldanciu this can be mitigated with retention policies on the blob storage as this is way more efficient and cost effective.

kumare3 avatar Apr 22 '24 03:04 kumare3