-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a new cephfs PVC from snapshot fails #5060
Comments
@sle78 its not a failure, its a intermediate state CephFS clones are full copies and the time to create a new PVC from snapshot or pvc-pvc clone depends on below 3 things
|
Hi @Madhu-1 thanks for your response. I waited for more than 24 hours and the clone never happened, it looks like it's failing to create the new subvolume? This is a production clusters and in general we dont have any issue with creating ceph-block and ceph-filesystem devices, they're provisioned quite fast. This is only affecting cloning from snapshots, i.e. specifying the datasource in the PVC.
|
Yes this is expected as i mentioned above clones are full copy in cephfs. Please test below to ensure the cloning works and you might find similar issues already in the repo where people talked about this one
Test the above flow to ensure the cephfs cloning works. |
Right, all that worked fine:
kc get volumesnapshot csi-test-snapshot
Where do I go from here? Thanks |
Tried again using the failing PVC clone:
I found some interesting logs on the ceph-mgr:
Note this in the ceph mgr logs:
|
@sle78 csi deleted the clone only in one case not in any other case. where csi will create a clone and wait for it to get to the completed state and if it goes to a failed state, it will delete the failed clone and create a new clone, you can track this in the cephcsi logs. |
@Madhu-1 have you tried to replicate this issue on your side? |
I have not seen this issue. |
Is there any debug option for this or any more logs that I can send to make some progress on this? Thanks |
In my case, the following workflow does not work either:
❯ k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
pvc Bound pvc-075473bc-cfbc-4c02-a2ea-d4b222901c25 1Gi RWO rook-ceph-fs <unset> 2m35s
pvc-from-snapshot Pending rook-ceph-fs <unset> 2m10s
❯ k get vs
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
snapshot true pvc 1Gi rook-ceph-fs snapcontent-a1f8a1d8-c56a-4f63-ba11-b7ec6dedeef1 2m40s 2m40s
❯ k describe pvc pvc-from-snapshot
Name: pvc-from-snapshot
Namespace: default
StorageClass: rook-ceph-fs
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: storage.cephfs.csi.ceph.com
volume.kubernetes.io/storage-provisioner: storage.cephfs.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
DataSource:
APIGroup: snapshot.storage.k8s.io
Kind: VolumeSnapshot
Name: snapshot
Used By: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 35s (x9 over 2m43s) storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-6cd594cb7-q8p8c_cd2f1172-83e9-443d-8fca-9c8206d19cf7 External provisioner is provisioning volume for claim "default/pvc-from-snapshot"
Warning ProvisioningFailed 35s (x9 over 2m43s) storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-6cd594cb7-q8p8c_cd2f1172-83e9-443d-8fca-9c8206d19cf7 failed to provision volume with StorageClass "rook-ceph-fs": rpc error: code = Aborted desc = clone from snapshot is pending
Normal ExternalProvisioning 4s (x13 over 2m43s) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'storage.cephfs.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
I see the following logs manager: debug 2025-02-01T15:08:13.955+0000 ffff2620b800 0 [volumes INFO volumes.module] Starting _cmd_fs_clone_status(clone_name:csi-vol-ce62212c-6e5e-42b9-b038-f0b72d7485ab, format:json, group_name:csi, prefix:fs clone status, vol_name:rook-ceph-fs) < ""
debug 2025-02-01T15:08:13.960+0000 ffff2620b800 0 [volumes INFO volumes.module] Finishing _cmd_fs_clone_status(clone_name:csi-vol-ce62212c-6e5e-42b9-b038-f0b72d7485ab, format:json, group_name:csi, prefix:fs clone status, vol_name:rook-ceph-fs) < "" provisioner: E0201 15:41:20.310887 1 controller.go:974] "Unhandled Error" err="error syncing claim \"ca5c2ea0-dd1
4-446e-ae78-b422ecba7cff\": failed to provision volume with StorageClass \"rook-ceph-fs\": rpc error: code
= Aborted desc = clone from snapshot is pending" logger="UnhandledError" Ceph version: |
You need to mount the filesystem and track the clone percentage manually. recently ceph introduced a feature for it, if you are using required ceph version, cephcsi logs will contains the status #4813 |
@mmontes11 couple of questions
Try creating the clone using the ceph CLI command and see if that is working fine, If not its an issue with ceph not with CSI. |
Describe the bug
Creating a new cephfs PVC from snapshot is failing.
Environment details
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : N/Arook-ceph-provisioner stack
Steps to reproduce
Create a new cephfs PVC from snapshot
Logs
These errors keep coming out over and over again.
CSI-provisioner logs:
csi-cephfsplugin logs:
The snapshots are being taken successfully and it's failing at copying content into the new pvc.
The text was updated successfully, but these errors were encountered: