Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use storage class from annotation for host-assisted PVC operation #3585

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

TheRealSibasishBehera
Copy link

What this PR does / why we need it:
modify the getStorageClassCorrespondingToSnapClass function to use the storage class from the annotation for host-assisted PVC operations.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3530

Special notes for your reviewer:

Release note:

NONE

modify the `getStorageClassCorrespondingToSnapClass` function to use the storage class from the annotation for host-assisted PVC operations.

Fixes: kubevirt#3530

Signed-off-by: Sibasish Behera <[email protected]>
@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/M labels Jan 8, 2025
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign aglitke for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@TheRealSibasishBehera
Copy link
Author

cc: @akalenyu

@akalenyu
Copy link
Collaborator

akalenyu commented Jan 8, 2025

Thank you for the PR! 🙏

What do you think about capturing the mapping between the volumesnapshotclass->storageclass
on the volumesnapshotclass annotations instead?
This way, the admin configures it once on a cluster level and subsequent restores should work seamlessly for any DV

@kubevirt-bot
Copy link
Contributor

@TheRealSibasishBehera: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cdi-linter dcf7cfb link false /test pull-cdi-linter
pull-cdi-generate-verify dcf7cfb link true /test pull-cdi-generate-verify
pull-containerized-data-importer-e2e-destructive dcf7cfb link true /test pull-containerized-data-importer-e2e-destructive

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@@ -450,7 +450,12 @@ func (r *SnapshotCloneReconciler) createTempHostAssistedSourcePvc(dv *cdiv1.Data
return nil
}

func (r *SnapshotCloneReconciler) getStorageClassCorrespondingToSnapClass(driver string) (string, error) {
func (r *SnapshotCloneReconciler) getStorageClassCorrespondingToSnapClass(dv *cdiv1.DataVolume, driver string) (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this only covers the non-CSI path (initLegacyClone), I assume you're more interested in the CSI one?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes I made are for cases where TempHostAssistedSourcePvc is created, i.e., the csi driver is not present, or the csi driver for the source and target is mismatched. IIRC both are triggered by initLegacyClone . the first one is my use case

happy to work on any similar cases. Are you referring to the remaining case, which is the csi driver for target pvc and snapshot match? It tries for a direct snapshot to target pvc restore , but it fails because of a csi incompatibility issue ?

Copy link
Collaborator

@akalenyu akalenyu Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, I forgot the issue details. Let me stress again that what you're describing is very unusual today

the csi driver is not present


It tries for a direct snapshot to target pvc restore , but it fails because of a csi incompatibility issue ?

Yes, here's a ref to this path

func getStorageClassNameForTempSourceClaim(ctx context.Context, vsc *snapshotv1.VolumeSnapshotContent, client client.Client) (string, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I will work on this

Copy link
Collaborator

@akalenyu akalenyu Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to poke at the setup a little bit again, the provisioner has VolumeSnapshots support (which I thought implies it's CSI) but there's no CSIDriver object?

Copy link
Author

@TheRealSibasishBehera TheRealSibasishBehera Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the case. The provisioner is openebs mayastor The deployment manifest does not contain a CSIDriver packaged within

That sounds unintended, how would you then configure things like fsGroupPolicy? https://kubernetes-csi.github.io/docs/support-fsgroup.html

These are filled in as parameters for the storage class (not exactly the same names but similar functionality that made it possible before GA, I believe). I have confirmed with their community that they do not have a CSIDriver.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt think about also using pv.spec.CSI since it is directly populated in when using a csi driver

I don't think it's possible since a PV does not exist at the time CDI is interested in this

so its not evaluated on fly for each dv/pvc ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the motivation to avoid a CSIDriver object.. AFAIK it's a requirement https://kubernetes-csi.github.io/docs/csi-driver-object.html

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt think about also using pv.spec.CSI since it is directly populated in when using a csi driver

I don't think it's possible since a PV does not exist at the time CDI is interested in this

so its not evaluated on fly for each dv/pvc ?

It's evaluated prior to PVC creation to understand if we can use populators

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the motivation to avoid a CSIDriver object.. AFAIK it's a requirement https://kubernetes-csi.github.io/docs/csi-driver-object.html

Yeah I agree it’s a standard and should be supported 😞
This seems like a case of delayed adoption to me, as it used to work without it

@TheRealSibasishBehera
Copy link
Author

Thank you for the PR! 🙏

What do you think about capturing the mapping between the volumesnapshotclass->storageclass on the volumesnapshotclass annotations instead? This way, the admin configures it once on a cluster level and subsequent restores should work seamlessly for any DV

Yes it sounds good that the configuration can be set once .

If I understand correctly, we would refer to the VolumeSnapshotClass from the source snapshot during restores. However, I am wondering about the scenario where the VolumeSnapshotClass is deleted after the snapshot is created. In such case we cant refer to the appropriate storageclass used in VolumeSnapshotClass annotation

@akalenyu
Copy link
Collaborator

akalenyu commented Jan 8, 2025

Thank you for the PR! 🙏
What do you think about capturing the mapping between the volumesnapshotclass->storageclass on the volumesnapshotclass annotations instead? This way, the admin configures it once on a cluster level and subsequent restores should work seamlessly for any DV

Yes it sounds good that the configuration can be set once .

If I understand correctly, we would refer to the VolumeSnapshotClass from the source snapshot during restores. However, I am wondering about the scenario where the VolumeSnapshotClass is deleted after the snapshot is created. In such case we cant refer to the appropriate storageclass used in VolumeSnapshotClass annotation

I think that is either unlikely or impossible when a snapshot that was provisioned by it still exists
(that snapshot cannot be deleted forever in these cases)

@TheRealSibasishBehera
Copy link
Author

I think that is either unlikely or impossible when a snapshot that was provisioned by it still exists (that snapshot cannot be deleted forever in these cases)

Got it, I’ll update the implementation

Also, do you think this ability needs to be documented anywhere?

@akalenyu
Copy link
Collaborator

akalenyu commented Jan 9, 2025

I think that is either unlikely or impossible when a snapshot that was provisioned by it still exists (that snapshot cannot be deleted forever in these cases)

Got it, I’ll update the implementation

Also, do you think this ability needs to be documented anywhere?

Yes, feel free to drop a hint at

# Data Volume cloning from a VolumeSnapshot source
## Introduction

@akalenyu
Copy link
Collaborator

Thank you for the PR! 🙏

What do you think about capturing the mapping between the volumesnapshotclass->storageclass on the volumesnapshotclass annotations instead? This way, the admin configures it once on a cluster level and subsequent restores should work seamlessly for any DV

Just realized we already have a mechanism to tie storageclass->snapclass via #2898 so we could just use this instead of an annotation?
(Loop over storageprofiles with vsc == sourceSnap.vsc)
Not completely sure what happens if 2 storage classes point at the same snapshot class though

@TheRealSibasishBehera
Copy link
Author

Thank you for the PR! 🙏
What do you think about capturing the mapping between the volumesnapshotclass->storageclass on the volumesnapshotclass annotations instead? This way, the admin configures it once on a cluster level and subsequent restores should work seamlessly for any DV

Just realized we already have a mechanism to tie storageclass->snapclass via #2898 so we could just use this instead of an annotation? (Loop over storageprofiles with vsc == sourceSnap.vsc) Not completely sure what happens if 2 storage classes point at the same snapshot class though

This would not work if we check the status. If multiple StorageProfiles have the same snapshot class (e.g., because it is set as the default or is the first in a sorted list), we might end up picking a storage class(say the first one in loop over storageprofiles with vsc == sourceSnap.vsc ) which is not intentionally set by the admin and is incompatible

we can check the spec for this, as it is user-defined. This way, we can trust that the storage class and snapshot class are compatible as a pair, and the selection is not arbitrary

@akalenyu
Copy link
Collaborator

akalenyu commented Jan 20, 2025

Thank you for the PR! 🙏
What do you think about capturing the mapping between the volumesnapshotclass->storageclass on the volumesnapshotclass annotations instead? This way, the admin configures it once on a cluster level and subsequent restores should work seamlessly for any DV

Just realized we already have a mechanism to tie storageclass->snapclass via #2898 so we could just use this instead of an annotation? (Loop over storageprofiles with vsc == sourceSnap.vsc) Not completely sure what happens if 2 storage classes point at the same snapshot class though

This would not work if we check the status. If multiple StorageProfiles have the same snapshot class (e.g., because it is set as the default or is the first in a sorted list), we might end up picking a storage class(say the first one in loop over storageprofiles with vsc == sourceSnap.vsc ) which is not intentionally set by the admin and is incompatible

we can check the spec for this, as it is user-defined. This way, we can trust that the storage class and snapshot class are compatible as a pair, and the selection is not arbitrary

I'm afraid that would suffer from the same issue where two profiles point to the same volumesnapshotclass
(though in this case this was explicitly a choice on the user's side)

@TheRealSibasishBehera
Copy link
Author

I'm afraid that would suffer from the same issue where two profiles point to the same volumesnapshotclass (though in this case this was explicitly a choice on the user's side)

I was considering a solution somewhat like this :

func GetUserSpecifiedStorageClassFromSnapshotClass(ctx context.Context, c client.Client, snapClassName string) (*string, error) {
	storageProfiles := &cdiv1.StorageProfileList{}
	err := c.List(ctx, storageProfiles)
	if err != nil {
		return nil, err
	}
	for _, storageProfile := range storageProfiles.Items {
		if storageProfile.Spec.SnapshotClass != nil && 
		   *storageProfile.Spec.SnapshotClass == snapClassName && 
		   *storageProfile.Status.SnapshotClass == snapClassName {
			return storageProfile.Status.StorageClass, nil
		}
	}
	return nil, nil
}
  1. The user attempts to restore from a snapshot, triggering a host-assisted pod.
  2. It fails with an error from the CSI driver.
  3. The user identifies the compatible StorageClass and updates the StorageProfile of the StorageClass with the VolumeSnapshotClass of the snapshot.
  4. The StorageProfile reconciler ensures that the update is valid and only takes effect if the drivers match.
  5. Case 1:
    Only one StorageClass and VolumeSnapshotClass combination explicitly set by the user exists,
    This combination is used (validated by:
    *storageProfile.Spec.SnapshotClass == snapClassName && *storageProfile.Status.SnapshotClass == snapClassName).
  6. Case 2:
    More than one such combination exists, but all are explicitly set by the user.
    This is not a problem, as we can infer that the user is aware of which combination would work. So we can return the first match
  7. Case 3:
    More than one combination exists, but none are not explicitly set by the user.
    In this case, we just do not use the StorageProfile recommendation and instead resort to the old default ann or sorted list.

Of course the user cant be sure if the selected storage class would work in both storage profile approach or annotaion approach . But this way feels more organic since we have StorageProfile as a separate CRD taking care of storage choices

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note-none Denotes a PR that doesn't merit a release note. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random Storage Class Selection During Host-Assisted Snapshot Restore
3 participants