New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
STOR-1838: add test for vsphere driver snapshot configuration #28717
base: master
Are you sure you want to change the base?
Conversation
/test ? |
@RomanBednar: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
a66f1f6
to
2327793
Compare
/test e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
2327793
to
6b248e2
Compare
@RomanBednar: This pull request references STOR-1838 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/hold I think we need to add vsphere techpreview job first so we can trigger it here before merging: openshift/release#51039 |
Job Failure Risk Analysis for sha: 6b248e2
|
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c632ad20-fcc9-11ee-81ee-fbc318190246-0 |
}) | ||
|
||
func setClusterCSIDriverSnapshotOptions(oc *exutil.CLI, snapshotOptions string, value int) { | ||
patch := []byte(fmt.Sprintf("{\"spec\":{\"driverConfig\":{\"vSphere\":{\"%s\": %d}}}}", snapshotOptions, value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you want to use patch as a string, then at least use backticks to avoid escaping \"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, using Update() instead.
clusterCSIDriverOptions: map[string]int{ | ||
snapshotOptions["vsan"]["clusterCSIDriver"]: 4, | ||
}, | ||
cloudConfigOptions: map[string]int{ | ||
snapshotOptions["vsan"]["cloudConf"]: 4, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why so complicated?
clusterCSIDriverOptions: opv1.VSphereCSIDriverConfigSpec {
granularMaxSnapshotsPerBlockVolumeInVSAN: ptr.To(uint32(4)),
},
cloudConfigOptions: map[string]string{
"granular-max-snapshots-per-block-volume-vsan": "4".
}
(note change of the field types)
It will be slightly harder to compute the ClusterCSIDriver patch, but you can use Update()
instead or encode VSphereCSIDriverConfigSpec to json.
On the positive side, it's crystal clear what field is set and what ini file is expected without reading a separate map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed and bumping openshift/api to pull in the new clusterCSIDriver fields.
pvc, err := createTestPVC(oc, oc.Namespace(), "test-pvc", "1Gi") | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
_, err = createTestPod(oc, pvc.Name, oc.Namespace()) | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
// Wait for pvc to be bound. | ||
o.Eventually(func() error { | ||
pvc, err := oc.AdminKubeClient().CoreV1().PersistentVolumeClaims(oc.Namespace()).Get(context.Background(), "test-pvc", metav1.GetOptions{}) | ||
if err != nil { | ||
return err | ||
} | ||
if pvc.Status.Phase != v1.ClaimBound { | ||
return fmt.Errorf("PVC not bound") | ||
} | ||
return nil | ||
}) | ||
|
||
for i := 0; i < t.successfulSnapshotsCreated; i++ { | ||
err := createSnapshot(oc, oc.Namespace(), fmt.Sprintf("test-snapshot-%d", i), "test-pvc") | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
} | ||
|
||
// Next snapshot creation should be over the set limit and fail. | ||
err = createSnapshot(oc, oc.Namespace(), "test-snapshot-failed", "test-pvc") | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
readyToUse, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.readyToUse}").Output() | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
errMsg, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.error.message}").Output() | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
e2e.Logf("VolumeSnapshot error message: %s readyToUse %s", errMsg, readyToUse) | ||
if !strings.Contains(errMsg, "failed to take snapshot of the volume") && readyToUse != "false" { | ||
e2e.Failf("VolumeSnapshot \"test-snapshot-failed\" should have failed and should not be ready to use") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go to its own function, just like loadAndCheckCloudConf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to separate function.
for option, value := range t.cloudConfigOptions { | ||
o.Eventually(func() error { | ||
return loadAndCheckCloudConf(oc, "Snapshot", option, value) | ||
}, time.Minute, time.Second).Should(o.Succeed()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To check all three ini file keys you Get
the same object from the API server three times? loadAndCheckCloudConf
can get a map[string]string
and check all of them with a single Get
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I missed that, changed.
section, err := cfg.GetSection(sectionName) | ||
if err != nil { | ||
return fmt.Errorf("section %s not found in cloud.conf: %v", sectionName, err) | ||
} | ||
|
||
key, err := section.GetKey(keyName) | ||
if err != nil { | ||
return fmt.Errorf("key %s not found in section %s: %v", keyName, sectionName, err) | ||
} | ||
|
||
o.Expect(key.String()).To(o.Equal(fmt.Sprintf("%d", expectedValue))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check that the other keys are not set.
I wonder what section.KeysHash()
returns. Is it the same map as t.cloudConfigOptions
, if you used map[string]string
instead of int
as I suggest? Will DeepEqual
work then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeepEqual() seems to work well in this case, thanks for the suggestion.
} | ||
|
||
if t.successfulSnapshotsCreated > 0 { | ||
pvc, err := createTestPVC(oc, oc.Namespace(), "test-pvc", "1Gi") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the PVC deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added removal. The cleanup slows down the test a bit, how bad is it to rely on the project teardown (in oc from util.CLI)?
pvc, err := createTestPVC(oc, oc.Namespace(), "test-pvc", "1Gi") | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
_, err = createTestPod(oc, pvc.Name, oc.Namespace()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the Pod deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as PVC.
}) | ||
|
||
for i := 0; i < t.successfulSnapshotsCreated; i++ { | ||
err := createSnapshot(oc, oc.Namespace(), fmt.Sprintf("test-snapshot-%d", i), "test-pvc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the snapshots deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as PVC - snapshot removal is taking the longest.
} | ||
|
||
// Next snapshot creation should be over the set limit and fail. | ||
err = createSnapshot(oc, oc.Namespace(), "test-snapshot-failed", "test-pvc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not guaranteed that it's the the last snapshot that is going to fail. The test creates several VolumeSnapshots in a quick succession, the controller + CSI driver may process them in any order. You should wait until the "good snapshots" are ready to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, adding a simple check with o.Eventually() counting ready snapshots.
readyToUse, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.readyToUse}").Output() | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
errMsg, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.error.message}").Output() | ||
o.Expect(err).NotTo(o.HaveOccurred()) | ||
|
||
e2e.Logf("VolumeSnapshot error message: %s readyToUse %s", errMsg, readyToUse) | ||
if !strings.Contains(errMsg, "failed to take snapshot of the volume") && readyToUse != "false" { | ||
e2e.Failf("VolumeSnapshot \"test-snapshot-failed\" should have failed and should not be ready to use") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I miss some loop waiting for the snapshots to get failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding.
6b248e2
to
1dfd605
Compare
/unhold |
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3f876ab0-ff06-11ee-9af2-ff5ed051b6b0-0 |
Job Failure Risk Analysis for sha: 1dfd605
|
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d157a940-ff25-11ee-8c2f-58a38c674904-0 |
Job Failure Risk Analysis for sha: e98e2da
|
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/47ac7580-02dc-11ef-82c7-51bd1731bc57-0 |
e98e2da
to
97ac991
Compare
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2e806bf0-0306-11ef-92be-1329447f7f86-0 |
Rebased to include: #28733 |
/assign @deads2k for approval |
@RomanBednar: GitHub didn't allow me to assign the following users: for, approval. Note that only openshift members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Job Failure Risk Analysis for sha: 97ac991
|
1 similar comment
Job Failure Risk Analysis for sha: 97ac991
|
periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial failed and the failure looks related:
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/76a96420-03a3-11ef-8ce6-881a7444b5f4-0 |
Tried to run snapshot test isolated again and observed operator status changes - the DaemonSet redeploys pods when cloud conf ConfigMap is changed, but that's expected along with status changing once:
EDIT: CSO generating too many condition events is a known bug, currently being investigated - https://issues.redhat.com/browse/OCPBUGS-24061 |
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2893b4d0-03d9-11ef-8f9e-a06f21de842c-0 |
/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cdad4690-0c65-11ef-8463-344af23f030f-0 |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jsafrane, RomanBednar The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@RomanBednar: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Job Failure Risk Analysis for sha: 15db814
|
openshift/vmware-vsphere-csi-driver-operator#230 has merged /payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial |
@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/32e1d770-0ed1-11ef-97f2-5abf97a9ff70-0 |
registry.AddPathologicalEventMatcherOrDie(&SimplePathologicalEventMatcher{ | ||
name: "StorageOperatorsFlipsProgressingTooOften", | ||
locatorKeyRegexes: map[monitorapi.LocatorKey]*regexp.Regexp{ | ||
monitorapi.LocatorNamespaceKey: regexp.MustCompile(`^openshift-cluster-storage-operator$`), | ||
}, | ||
messageReasonRegex: regexp.MustCompile(`^OperatorStatusChanged$`), | ||
messageHumanRegex: regexp.MustCompile(`Progressing changed.*`), | ||
jira: "https://issues.redhat.com/browse/OCPBUGS-24061", | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you drop this commit and run the payload job again now that we have a fix merged?
No description provided.