-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volumeattachment deletion in a detach operation should carry the resourceVersion #124749
Comments
/sig apps |
/sig storage |
@pandaamanda Which version of external-attacher are you using? |
Not quite sure about the exact version. But I looked at the latest external-attacher code, and va's deletion event doesn't handle this concurrency. |
/assign @jsafrane |
/triage accepted |
/unassign |
sure. |
What happened?
There is a use case in the flow test that creates a pod which uses pvc and then waits about 2 minutes and then delete the pod. Later it was discovered that the pv referenced by the pod had been attached on the node and had not been detached.
Combining the csi plugin and k8s component logs, we found that the csi plugin took a long time to attach, and it was very late before it succeeded, and then it patched finalizers on the volumeattachment resource. At the same time, the deletion of the pod triggered the k8s detach operation, which will delete the The volumeattachment resource.
Due to the multiple instances of apiserver, when the volumeattachment delete operation reaches an apiserver, it does not realize that finalizers have been patched on the va resource, resulting in a successful delete operation that deletes the va.
csi plugin log:
kube-controller-manager log:
What did you expect to happen?
Solve the problem of concurrent operations of finalizers patch and va deletion to ensure the safe deletion of va.
in pkg/volume/csi/csi_attacher.go
we can use c.plugin.volumeAttachmentLister.Get(attachID) to get va resourceVersion and pass to Delete function bellow to solve the problem.
How can we reproduce it (as minimally and precisely as possible)?
Low probability of recurrence
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: