Better support for package deletion
There are several issues with the package deletion flow that we want to address:
- Currently every revision of a package has to be deleted individually. We should allow for deletion of all package revision for the same package.
- Deletion should require review/approval.
Comment on this issue copied from different source:
The best way (that I can think of) to solve this would be with the aggregate Package resource.
When implementing deletion for PackageRevision resources I was taking into consideration k8s API style and decided to implement deletion simply as a DELETE verb. Given RBAC, deletion can require appropriate permission but aligning it with approval flow and possibly implementing it using other form than DELETE HTTP verb (i.e create new package revision, empty it out, mark it deleted, and signal that all other revisions should likewise disappear when this change is approved) may play better with the rollout mechanisms too.
This is also already captured in the Porch roadmap doc
Basically deletion via the UI doesn't fully delete, which is a problem for all GitOps setups that watch HEAD.
Do we need the package resource to make this work? #3213
cc @ChristopherFry
Basically deletion via the UI doesn't fully delete, which is a problem for all GitOps setups that watch HEAD.
Do we need the package resource to make this work? https://github.com/GoogleContainerTools/kpt/issues/3213
Yeah, I got some feedback from @mortent offline who suggested that the right way to approach deletion would be to look into creating a package resource, so I'm spending some time looking into that. I'll leave updates/discussion points on #3213 for that.
I set up config-sync to point at a test deployment repo that I have, and played around with trying to get a package deleted from the cluster.
Created and published a single instance of the ghost package.
Running kpt alpha rpkg get shows the following:
test-deployments-35184c578c5fe882c035d0d2629c17304d0383a4 ghost-clone main false Published test-deployments
test-deployments-34c616cc79198dacfefd9f03cdb779e3ed093c96 ghost-clone v1 true Published test-deployments
test-deployments-546fc2fe90977802c96302c824671ee2f6b80fd3 ghost-clone/ghost-app main false Published test-deployments
test-deployments-55767db3b8b1fb0ca41735cf2cee29a98979f0d1 ghost-clone/mariadb main false Published test-deployments
Waited for it to sync to the cluster, and I see the resources in my cluster.
In order to delete the ghost package, I have to both delete the v1 package revision and also delete the main package revision:
$ kpt alpha rpkg del test-deployments-34c616cc79198dacfefd9f03cdb779e3ed093c96 -ndefault
test-deployments-34c616cc79198dacfefd9f03cdb779e3ed093c96 deleted
$ kpt alpha rpkg del test-deployments-35184c578c5fe882c035d0d2629c17304d0383a4 -ndefault
test-deployments-35184c578c5fe882c035d0d2629c17304d0383a4 deleted
The ghost package appears to be completely removed from the git repository - it's no longer in the main branch and all tags associated with that revision were deleted. Running kpt alpha rpkg get no longer shows the ghost package or its two subpackages. The only remnants of the ghost package are in the commit history.
Went back to my cluster, waited a few minutes to let Config Sync sync, which it did. All the resources from the package were deleted.
I also experimented with seeing what happens if I leave the v1 ghost package in tact, but delete the main ghost package.
test-deployments-35184c578c5fe882c035d0d2629c17304d0383a4 ghost-clone main false Published test-deployments
test-deployments-34c616cc79198dacfefd9f03cdb779e3ed093c96 ghost-clone v1 true Published test-deployments
test-deployments-546fc2fe90977802c96302c824671ee2f6b80fd3 ghost-clone/ghost-app main false Published test-deployments
test-deployments-55767db3b8b1fb0ca41735cf2cee29a98979f0d1 ghost-clone/mariadb main false Published test-deployments
$ kpt alpha rpkg del test-deployments-35184c578c5fe882c035d0d2629c17304d0383a4 -ndefault
test-deployments-35184c578c5fe882c035d0d2629c17304d0383a4 deleted
This removed the package from the "main" branch but left the v1 package revision in tact (the ghost/v1 tag was still in the repo, and running kpt alpha rpkg get still showed the v1 revision but not the main). This also caused the resources to be deleted from the cluster, which is sort of a "soft" delete.
I didn't really run into any issues with deletion... I can see it being annoying to have to delete package revisions individually if you want to completely remove the package from the repo, but I don't know that this needs to be handled by a "package" resource. It should be pretty easy to implement removing the package from "main" in the CI/UI for a soft delete, and removing all package revisions for a hard delete. The only question mark with that seems to be how do we want to handle making package deletion go through a review/approval process if we implement deletion on the client-side.
@mortent I recall you mentioned having some problems with package deletion, what did you run into?
It's a significant UX problem to list all the revisions in one flat view, so the UI finds the latest revisions and lists those, then finds the related revisions and displays them in package-specific lists.
To delete, the client also has to search for all related revisions of a package, which could be thousands, and individually delete them.
I haven't tried to delete the "main" revision (is that always the branch used?) without deleting the version-specific revisions, but I wouldn't want it to leave a lot of dangling tags. What I do when practicing and running demos is create stuff, delete it, and create it again with the same names. Right now, I clean up completely via git scripts or the github UI.
Yes, I found the same, I had to delete my deployment repo via the GH UI. I tried deleting the revision via the GUI and while it disappeared from the GUI, the package remained in the repo. I guess it just deleted the tag (didn't check that).
I was thinking about how to implement an appropriate approval workflow for deletion. At first, I'd thought of implementing it with subresources (like the current lifecycle approval flow), but we are considering moving away from the aggregated apiserver to a CRD, and we won't be able to use subresources in that case.
From what I've heard, the main reason for using subresources for lifecycle is so we can have separate RBAC for approval. If we aren't concerned about having separate RBAC for deletion, then the deletion state can just be another field in the package revision spec. I'm thinking either a boolean field deletionProposed, or a string field deletion with legal values being the empty string "" or proposed.
If/when we switch to CRDs for package revisions, we will need to make sure that the package revision CR can only be deleted from the cluster if the resource has already been proposed for deletion. We can do that with a validating webhook to check the value of spec.deletion/spec.deletionProposed when someone tries to delete the CR. We might want to do a similar thing for spec.lifecycle as we migrate from the aggregated apiserver to CRDs.
The alternative here I think would be to stick with the aggregated apiserver (instead of a CRD) for package revisions, so that we can do the deletion validation ourselves without a webhook.
So for deletion we actually can support a separate RBAC permission, since deletion will be a DELETE operation against the APIServer rather than an UPDATE, which will be used for all other lifecycle changes. So what we loose is that we can no longer support separate RBAC permissions for publishing a package. If we think that preventing the same user from both proposing and publishing the same PackageRevision, then we could probably handle that in a webhook. It is not the same as separate RBAC rules, but it might be sufficient. Also, with the Package Conditions (https://github.com/GoogleContainerTools/kpt/issues/3455) we do have a solution that might overlap somewhat with the propose and publish steps.
In addition to the two alternatives listed, technically we could consider making validation of the lifecycle state an async operation so we can do it in a controller. But it seems like that would be an awkward API to work with.
Overall, I agree with the conclusion that the best alternative to an aggregated apiserver is to rely on a webhook for validation.
I'm not sure about adding a new deletionProposed or deletion field. Wouldn't it be easier to just extend the lifecycle with an additional value (maybe ProposeDelete)?
I'm not sure about adding a new deletionProposed or deletion field. Wouldn't it be easier to just extend the lifecycle with an additional value (maybe ProposeDelete)?
I think it's fine to extend the lifecycle field, but it's not clear to me how would we differentiate between a published package that has been proposed for deletion vs a package draft that has been proposed for deletion? Or a package that has been proposed for publishing by one client, but proposed for deletion by another? Is the distinction not important?
I'm in favor of going with a webhook (or at least trying one!). There's some movement to use CEL to replace webhooks, and some potential also to support additional RBAC verbs in apimachinery by defining certain fields as controlled by certain verbs (that's more in the discussion phase though, AIUI). So we wouldn't necessarily be stuck with a webhook forever, and it feels like the winds are blowing in this direction. (Though I suspect in practice there will always be "one thing" we're waiting to roll out)
@natasha41575 do we have some additional fields that indicate where a draft "started"? I imagine we'll need that to know how to "rebase" when the upstream changes before our draft is accepted. I think this is probably the kpt upstream package, but I'm not certain.
@natasha41575 I think we can close this one?
Yeah, we can close this. I think the rest of this issue will be covered by https://github.com/GoogleContainerTools/kpt/issues/3213 if/when we get back to it, so we can use that for tracking the remainder of the work.