-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design-proposal: VM delete protection #363
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Differences between options are negligible in any scenario analyzed. Moreover, **no performance degradation has been | ||
observed.** | ||
|
||
Given the pros and cons, **OPTION \#1 is preferred in this case.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we go with this option then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, yes, we should go with this option.
The VM protection is implemented by adding | ||
a [ValidationAdmissionPolicy and ValidationAdmissionPolicyBinding](https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/) | ||
objects. On every delete action against a VM object, the ValidatingAdmissionPolicy will check for presence of the | ||
annotation. If it is present and its value equals “true” or “True” the request will be rejected. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to explain its behavior in scenarios where it is misconfigured?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might create a really verbose and noisy outputs IMO. Documentation should cover how to properly configure the feature, and it should make clear that other configurations/values, will allow the deletion.
/cc |
45dc46b
to
510b378
Compare
v2:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I'm not against this (as long as option 1 is used), I don't quite see the point.
Are we trying to protect PVCs that would have the VM as a parent? In that case, I think we should take steps to protect the PVCs directly instead.
The VM spec itself really isn't all that precious, and would cost just a few kilobytes of storage to backup somewhere if it really did contain special data.
This proposal sounds like there are interesting field stories associated with it, if so could you maybe share a concrete example of what happened?
No. We're trying to protect VMs from deletion by accident, e.g. by sysadmins that may run business critical applications or services (e.g. SQL server). |
e1c6873
to
355ae1a
Compare
Sure but what do you mean by "VMs"? What, if not disks, are we afraid to lose? |
As stated in the design document, what we are trying to protect is the VirtualMachine object from deletion. |
Thank you for the pointer. Please note that access to the comment you linked to is restricted. The important part of it is that |
Yes, this is a good summary. Thanks!
Just one note, both approaches mentioned in this design document does not allow the force deletion either. You could check it in the PoCs linked. We're also talking about protecting something that can easily be re-created (a VM spec). That being said, I don't feel strongly against this, as the maintenance burden seem minimal. @vladikr WDYT? I agree, but from the user point of view, they will have the guarantee that the VM will not be deleted no matter what. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I don't see why this is needed. I've seen reference to tekton pipelines as a justification, but I don't understand what the real problem is there.
|
||
### Enable the VM Object Protection | ||
|
||
To protect VMs against deletion, they need to be annotated with `kubevirt.io/vm-delete-protection` set to `true` or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd much rather have a first-class setting in the VM. Documentation of annotations tends to be fragmented and doesn't lead to a very good user experience. Making a dedicated setting in the spec means it will be properly documented in swagger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for your feedback, much appreciated.
While I'm not against this idea, I can see potential challenges in the future. IMHO, a good documentation entry should be enough. The API field vs an annotation has the following advantages:
- Low maintenance, no need to maintain a new API field.
- Extensible, it would be just a matter of adding new
resourcesRules
to the existing VAP if we would like to protect other KubeVirt objects. Creating a dedicated API field will force us to complicate the VAP and even to move the API setting from different places (creating room for potential backwards compatibility challenges).
IMHO, Jed summed it up really well here: #363 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be an invalid use case but what about using status and a subresource API to control this behaviour to allow finer grain RBAC control?
|
||
## User Stories | ||
|
||
As a user who has the permissions to delete VMs, I want to protect my VMs from accidental deletion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd simplify this and just say As a VM Owner,
the fact the owner has permissions to delete the VM isn't relevant to the fact that they want to protect it from accidental deletion by them or a third party.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
1. Removing the `kubevirt.io/vm-delete-protection` annotation with the following patch operation: | ||
```bash | ||
$ kubectl patch vm <vm-name> --type=json -p='[{"op": "remove", "path": "/metadata/annotations/kubevirt.io~1vm-delete-protection"}]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ultimately means that anyone with the ability to update/patch the VM can enable or disable this functionality. Have we considered placing it in status controlled by a subresource with finer grain RBAC contol?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but I assume this is the same concern raised in the top comment of your review.
I didn't think about that TBH, but it feels a bit overkill IMHO. However, what would be the ultimate goal of doing that? I'm not sure if this particular use case requires it. Please, elaborate a bit if you think otherwise.
In any case, if we find out this is required, we could implement it in a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ultimate goal would be to allow a smaller subset of users (super admins?) to have access to the delete protection toggle than have the ability to update the VM object itself. It's extra work but given the use case here I'm not sure we want to give everyone access to this toggle right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ultimate goal would be to allow a smaller subset of users (super admins?) to have access to the delete protection toggle than have the ability to update the VM object itself.
@lyarwood this feature is only about the single persona, the VM owner with "fat fingers". An additional role would be not helpful in this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dominikholler well it's any user with DELETE privileges against the VM surely so not just an owner but anyone in the same namespace or admin. It also doesn't protect against that same user accidentently removing the label and thus the protection before requesting the deletion.
Moving the toggle into the status would avoid this as users wouldn't be able to touch it without going through the sub resource API that itself could be limited to a specific owner and/or super set of admins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also doesn't protect against that same user accidentently removing the label and thus the protection before requesting the deletion.
This would require two steps, this feature is just protecting from a single unintended step, e.g. a `kubectl delete vm/xxx" .
355ae1a
to
f734d3f
Compare
Thank you for taking the time to write this proposal @jcanocan. At first, I thought that this might be a good idea. Then I became conflicted as it seems that ultimately we are protecting these VMs from the cluster admins. If we really need such a layer of protection, I would imagine this feature working from another direction. |
We could enable the protection by setting a label like: apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/vm-delete-protection: "true" In this way, as you mentioned, the cluster-admin could get a list of VMs with the protection enabled with something like: kubectl get vm --selector=kubevirt.io/vm-delete-protection=true WDYT? Did I get your idea right? Thanks for your input. |
@jcanocan What I had in mind is something closer to The idea is to separate the concerns of each persona. apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
kubevirt.io/please-avoid-deleting: "true" Then the admin would set the rule that would prevent the deletion of the associated VMs with the listed label : preventVMsDeletion:
selector: kubevirt.io/please-avoid-deleting |
perhaps the label/annotations can be protected with cel? @xpivarc |
It's an interesting idea. How would this work with newly created VMs? Doesn't it have to be only a secondary action? |
@vladikr Isn't this already working out-of-the-box using ValidatingAdmissionPolicy ? This feature is about having a fixed marking (e.g. label or annotation) so that the UI can configure a marking for a pre-defined rule. |
In which way would help this idea to help the admin with "fat-fingers" ? |
Please elaborate on what is working out of the box. :) I'm only trying to emphasize that we shouldn't now allow users to block admin operations without the admins' consent or/and a way to override this. We achieve this separation either with
This proposal is for KubeVirt which doesn't have a UI... but in all cases, I see a label or an annotation being used to request this feature. |
Not by itself, but this is just another way to limit the configuration of this feature to cluster admins and vm owners. |
I think we should get rid of mentioning "UI" in the design proposal. I've already opened discussion about it earlier this week: #363 (comment) |
Yeah I can't think of a clean way of providing it with the initial definition as you'd always have to remove it once populated in status to ensure a single source of truth.
Yup it makes the act more deliberate by forcing an admin or whomever it is with access to the subresource to first toggle delete protection there and then fire off the delete. Without this any user with the ability to update the VM object can toggle delete protection and then nuke the VM. |
According to my limited technical understanding, the cluster admin can already right now create a ValidatingAdmissionPolicy like this:
Now, if the VM owner adds the annotation
To reverse the operation and allow deletion, the following command can be used:
the same should work if a label is used instead of the annotation. This feature is about providing a easy to use, simple and testable interface to the already existing solution. But I see the point, that there should be a way to introduce a way for the admin to control the deletion protection.
I see, thanks for the hint! |
Where do you see the issue with this? Isn't it expected that if a user has the VM owners permissions this user is able to delete the VM? |
@dominikholler The policy you wrote is exactly what you want. Now why should this opinionated (subjective at least to me) policy be part of core Kubevirt? Especially when it is so easy to implement as you demonstrated? Maybe more opinionated component could introduce this? |
The proposal introduces a mechanism to protect VMs from being deleted. By adding the annotation `kubevirt.io/vm-delete-protection` and with a value of `true` or `True` to the VM object, the VM cannot be deleted by any means. This would prevent accidental deletions in VMs that users considers crucial for their applications/clusters. Signed-off-by: Javier Cano Cano <[email protected]>
f734d3f
to
0d826fd
Compare
Update:
|
I agree with others' comments from above. I'm not comfortable with this proposal. Kubernetes provides mechanisms to manage permissions regarding who's able to delete the different objects, like RBAC and VAP. By definition, admins are the trusted persona to perform dangerous actions, and the admin can easily add an opinionated VAP as demonstrated above if he wants to be extra paranoid. Thank you @jcanocan for your efforts on this one, but unless there's a compelling argument for it, I tend to advocate against it. |
Kubevirt has users, not customers :) |
What this PR does / why we need it:
The proposal introduces a mechanism to protect VMs from being deleted. By adding the annotation
kubevirt.io/vm-delete-protection
and with a value oftrue
orTrue
to the VM object, the VM cannot be deleted by any means. This would prevent accidental deletions in VMs that users considers crucial for their applications/clusters.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes # CNV-48696
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note: