Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STOR-1838: add test for vsphere driver snapshot configuration #28717

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

RomanBednar
Copy link
Contributor

No description provided.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 17, 2024
@RomanBednar
Copy link
Contributor Author

/test ?

Copy link
Contributor

openshift-ci bot commented Apr 17, 2024

@RomanBednar: The following commands are available to trigger required jobs:

  • /test e2e-aws-jenkins
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-image-registry
  • /test e2e-aws-ovn-serial
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-builds
  • /test e2e-gcp-ovn-image-ecosystem
  • /test e2e-gcp-ovn-upgrade
  • /test e2e-metal-ipi-ovn-ipv6
  • /test images
  • /test lint
  • /test unit
  • /test verify
  • /test verify-deps

The following commands are available to trigger optional jobs:

  • /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
  • /test e2e-agnostic-ovn-cmd
  • /test e2e-aws
  • /test e2e-aws-csi
  • /test e2e-aws-disruptive
  • /test e2e-aws-etcd-recovery
  • /test e2e-aws-multitenant
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-cgroupsv2
  • /test e2e-aws-ovn-etcd-scaling
  • /test e2e-aws-ovn-kubevirt
  • /test e2e-aws-ovn-single-node
  • /test e2e-aws-ovn-single-node-serial
  • /test e2e-aws-ovn-single-node-upgrade
  • /test e2e-aws-ovn-upgrade
  • /test e2e-aws-ovn-upi
  • /test e2e-aws-proxy
  • /test e2e-azure
  • /test e2e-azure-ovn-etcd-scaling
  • /test e2e-baremetalds-kubevirt
  • /test e2e-gcp-csi
  • /test e2e-gcp-disruptive
  • /test e2e-gcp-fips-serial
  • /test e2e-gcp-ovn-etcd-scaling
  • /test e2e-gcp-ovn-rt-upgrade
  • /test e2e-gcp-ovn-techpreview
  • /test e2e-gcp-ovn-techpreview-serial
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-dualstack-local-gateway
  • /test e2e-metal-ipi-sdn
  • /test e2e-metal-ipi-serial
  • /test e2e-metal-ipi-serial-ovn-ipv6
  • /test e2e-metal-ipi-virtualmedia
  • /test e2e-openstack-ovn
  • /test e2e-openstack-serial
  • /test e2e-vsphere
  • /test e2e-vsphere-ovn-dualstack-primaryv6
  • /test e2e-vsphere-ovn-etcd-scaling
  • /test okd-e2e-gcp

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd
  • pull-ci-openshift-origin-master-e2e-aws-csi
  • pull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2
  • pull-ci-openshift-origin-master-e2e-aws-ovn-fips
  • pull-ci-openshift-origin-master-e2e-aws-ovn-serial
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade
  • pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-origin-master-e2e-gcp-csi
  • pull-ci-openshift-origin-master-e2e-gcp-ovn
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgrade
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade
  • pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6
  • pull-ci-openshift-origin-master-e2e-metal-ipi-sdn
  • pull-ci-openshift-origin-master-e2e-openstack-ovn
  • pull-ci-openshift-origin-master-images
  • pull-ci-openshift-origin-master-lint
  • pull-ci-openshift-origin-master-unit
  • pull-ci-openshift-origin-master-verify
  • pull-ci-openshift-origin-master-verify-deps

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RomanBednar
Copy link
Contributor Author

/test e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented Apr 17, 2024

@RomanBednar: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test e2e-aws-jenkins
  • /test e2e-aws-ovn-fips
  • /test e2e-aws-ovn-image-registry
  • /test e2e-aws-ovn-serial
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-builds
  • /test e2e-gcp-ovn-image-ecosystem
  • /test e2e-gcp-ovn-upgrade
  • /test e2e-metal-ipi-ovn-ipv6
  • /test images
  • /test lint
  • /test unit
  • /test verify
  • /test verify-deps

The following commands are available to trigger optional jobs:

  • /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
  • /test e2e-agnostic-ovn-cmd
  • /test e2e-aws
  • /test e2e-aws-csi
  • /test e2e-aws-disruptive
  • /test e2e-aws-etcd-recovery
  • /test e2e-aws-multitenant
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-cgroupsv2
  • /test e2e-aws-ovn-etcd-scaling
  • /test e2e-aws-ovn-kubevirt
  • /test e2e-aws-ovn-single-node
  • /test e2e-aws-ovn-single-node-serial
  • /test e2e-aws-ovn-single-node-upgrade
  • /test e2e-aws-ovn-upgrade
  • /test e2e-aws-ovn-upi
  • /test e2e-aws-proxy
  • /test e2e-azure
  • /test e2e-azure-ovn-etcd-scaling
  • /test e2e-baremetalds-kubevirt
  • /test e2e-gcp-csi
  • /test e2e-gcp-disruptive
  • /test e2e-gcp-fips-serial
  • /test e2e-gcp-ovn-etcd-scaling
  • /test e2e-gcp-ovn-rt-upgrade
  • /test e2e-gcp-ovn-techpreview
  • /test e2e-gcp-ovn-techpreview-serial
  • /test e2e-metal-ipi-ovn-dualstack
  • /test e2e-metal-ipi-ovn-dualstack-local-gateway
  • /test e2e-metal-ipi-sdn
  • /test e2e-metal-ipi-serial
  • /test e2e-metal-ipi-serial-ovn-ipv6
  • /test e2e-metal-ipi-virtualmedia
  • /test e2e-openstack-ovn
  • /test e2e-openstack-serial
  • /test e2e-vsphere
  • /test e2e-vsphere-ovn-dualstack-primaryv6
  • /test e2e-vsphere-ovn-etcd-scaling
  • /test okd-e2e-gcp

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd
  • pull-ci-openshift-origin-master-e2e-aws-csi
  • pull-ci-openshift-origin-master-e2e-aws-ovn-cgroupsv2
  • pull-ci-openshift-origin-master-e2e-aws-ovn-fips
  • pull-ci-openshift-origin-master-e2e-aws-ovn-serial
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-serial
  • pull-ci-openshift-origin-master-e2e-aws-ovn-single-node-upgrade
  • pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade
  • pull-ci-openshift-origin-master-e2e-gcp-csi
  • pull-ci-openshift-origin-master-e2e-gcp-ovn
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-builds
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-rt-upgrade
  • pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade
  • pull-ci-openshift-origin-master-e2e-metal-ipi-ovn-ipv6
  • pull-ci-openshift-origin-master-e2e-metal-ipi-sdn
  • pull-ci-openshift-origin-master-e2e-openstack-ovn
  • pull-ci-openshift-origin-master-images
  • pull-ci-openshift-origin-master-lint
  • pull-ci-openshift-origin-master-unit
  • pull-ci-openshift-origin-master-verify
  • pull-ci-openshift-origin-master-verify-deps

In response to this:

/test e2e-vsphere-ovn-techpreview-serial

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@RomanBednar RomanBednar changed the title WIP: add test for vsphere driver snapshot configuration STOR-1838: add test for vsphere driver snapshot configuration Apr 17, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 17, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Apr 17, 2024

@RomanBednar: This pull request references STOR-1838 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RomanBednar
Copy link
Contributor Author

RomanBednar commented Apr 17, 2024

/hold

I think we need to add vsphere techpreview job first so we can trigger it here before merging: openshift/release#51039

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Apr 17, 2024
@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 6b248e2

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (20) are below the historical average (687): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd IncompleteTests
Tests for this run (19) are below the historical average (601): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@jsafrane
Copy link
Contributor

/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented Apr 17, 2024

@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/c632ad20-fcc9-11ee-81ee-fbc318190246-0

})

func setClusterCSIDriverSnapshotOptions(oc *exutil.CLI, snapshotOptions string, value int) {
patch := []byte(fmt.Sprintf("{\"spec\":{\"driverConfig\":{\"vSphere\":{\"%s\": %d}}}}", snapshotOptions, value))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you want to use patch as a string, then at least use backticks to avoid escaping \"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, using Update() instead.

Comment on lines 109 to 114
clusterCSIDriverOptions: map[string]int{
snapshotOptions["vsan"]["clusterCSIDriver"]: 4,
},
cloudConfigOptions: map[string]int{
snapshotOptions["vsan"]["cloudConf"]: 4,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so complicated?

clusterCSIDriverOptions: opv1.VSphereCSIDriverConfigSpec {
    granularMaxSnapshotsPerBlockVolumeInVSAN: ptr.To(uint32(4)),
},

cloudConfigOptions: map[string]string{
    "granular-max-snapshots-per-block-volume-vsan": "4".
}

(note change of the field types)

It will be slightly harder to compute the ClusterCSIDriver patch, but you can use Update() instead or encode VSphereCSIDriverConfigSpec to json.
On the positive side, it's crystal clear what field is set and what ini file is expected without reading a separate map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed and bumping openshift/api to pull in the new clusterCSIDriver fields.

Comment on lines 157 to 156
pvc, err := createTestPVC(oc, oc.Namespace(), "test-pvc", "1Gi")
o.Expect(err).NotTo(o.HaveOccurred())

_, err = createTestPod(oc, pvc.Name, oc.Namespace())
o.Expect(err).NotTo(o.HaveOccurred())

// Wait for pvc to be bound.
o.Eventually(func() error {
pvc, err := oc.AdminKubeClient().CoreV1().PersistentVolumeClaims(oc.Namespace()).Get(context.Background(), "test-pvc", metav1.GetOptions{})
if err != nil {
return err
}
if pvc.Status.Phase != v1.ClaimBound {
return fmt.Errorf("PVC not bound")
}
return nil
})

for i := 0; i < t.successfulSnapshotsCreated; i++ {
err := createSnapshot(oc, oc.Namespace(), fmt.Sprintf("test-snapshot-%d", i), "test-pvc")
o.Expect(err).NotTo(o.HaveOccurred())
}

// Next snapshot creation should be over the set limit and fail.
err = createSnapshot(oc, oc.Namespace(), "test-snapshot-failed", "test-pvc")
o.Expect(err).NotTo(o.HaveOccurred())

readyToUse, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.readyToUse}").Output()
o.Expect(err).NotTo(o.HaveOccurred())

errMsg, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.error.message}").Output()
o.Expect(err).NotTo(o.HaveOccurred())

e2e.Logf("VolumeSnapshot error message: %s readyToUse %s", errMsg, readyToUse)
if !strings.Contains(errMsg, "failed to take snapshot of the volume") && readyToUse != "false" {
e2e.Failf("VolumeSnapshot \"test-snapshot-failed\" should have failed and should not be ready to use")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should go to its own function, just like loadAndCheckCloudConf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to separate function.

Comment on lines 150 to 152
for option, value := range t.cloudConfigOptions {
o.Eventually(func() error {
return loadAndCheckCloudConf(oc, "Snapshot", option, value)
}, time.Minute, time.Second).Should(o.Succeed())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check all three ini file keys you Get the same object from the API server three times? loadAndCheckCloudConf can get a map[string]string and check all of them with a single Get.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I missed that, changed.

Comment on lines 227 to 237
section, err := cfg.GetSection(sectionName)
if err != nil {
return fmt.Errorf("section %s not found in cloud.conf: %v", sectionName, err)
}

key, err := section.GetKey(keyName)
if err != nil {
return fmt.Errorf("key %s not found in section %s: %v", keyName, sectionName, err)
}

o.Expect(key.String()).To(o.Equal(fmt.Sprintf("%d", expectedValue)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check that the other keys are not set.
I wonder what section.KeysHash() returns. Is it the same map as t.cloudConfigOptions, if you used map[string]string instead of int as I suggest? Will DeepEqual work then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeepEqual() seems to work well in this case, thanks for the suggestion.

}

if t.successfulSnapshotsCreated > 0 {
pvc, err := createTestPVC(oc, oc.Namespace(), "test-pvc", "1Gi")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the PVC deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added removal. The cleanup slows down the test a bit, how bad is it to rely on the project teardown (in oc from util.CLI)?

pvc, err := createTestPVC(oc, oc.Namespace(), "test-pvc", "1Gi")
o.Expect(err).NotTo(o.HaveOccurred())

_, err = createTestPod(oc, pvc.Name, oc.Namespace())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the Pod deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as PVC.

})

for i := 0; i < t.successfulSnapshotsCreated; i++ {
err := createSnapshot(oc, oc.Namespace(), fmt.Sprintf("test-snapshot-%d", i), "test-pvc")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the snapshots deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as PVC - snapshot removal is taking the longest.

}

// Next snapshot creation should be over the set limit and fail.
err = createSnapshot(oc, oc.Namespace(), "test-snapshot-failed", "test-pvc")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not guaranteed that it's the the last snapshot that is going to fail. The test creates several VolumeSnapshots in a quick succession, the controller + CSI driver may process them in any order. You should wait until the "good snapshots" are ready to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, adding a simple check with o.Eventually() counting ready snapshots.

Comment on lines 184 to 193
readyToUse, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.readyToUse}").Output()
o.Expect(err).NotTo(o.HaveOccurred())

errMsg, err := oc.Run("get").Args("volumesnapshot/test-snapshot-failed", "-o", "jsonpath={.status.error.message}").Output()
o.Expect(err).NotTo(o.HaveOccurred())

e2e.Logf("VolumeSnapshot error message: %s readyToUse %s", errMsg, readyToUse)
if !strings.Contains(errMsg, "failed to take snapshot of the volume") && readyToUse != "false" {
e2e.Failf("VolumeSnapshot \"test-snapshot-failed\" should have failed and should not be ready to use")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss some loop waiting for the snapshots to get failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding.

@openshift-ci openshift-ci bot added the vendor-update Touching vendor dir or related files label Apr 20, 2024
@RomanBednar
Copy link
Contributor Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 20, 2024
@RomanBednar
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented Apr 20, 2024

@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3f876ab0-ff06-11ee-9af2-ff5ed051b6b0-0

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 1dfd605

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade High
[sig-apps] job-upgrade
This test has passed 100.00% of 27 runs on jobs ['periodic-ci-openshift-release-master-ci-4.16-e2e-aws-ovn-upgrade'] in the last 14 days.
pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade IncompleteTests
Tests for this run (26) are below the historical average (758): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-csi IncompleteTests
Tests for this run (25) are below the historical average (681): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@RomanBednar RomanBednar changed the title STOR-1838: add test for vsphere driver snapshot configuration WIP: STOR-1838: add test for vsphere driver snapshot configuration Apr 20, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 20, 2024
@RomanBednar
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented Apr 20, 2024

@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d157a940-ff25-11ee-8c2f-58a38c674904-0

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: e98e2da

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-metal-ipi-sdn IncompleteTests
Tests for this run (2) are below the historical average (1005): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-gcp-csi IncompleteTests
Tests for this run (2) are below the historical average (542): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-agnostic-ovn-cmd IncompleteTests
Tests for this run (2) are below the historical average (516): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@RomanBednar
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented Apr 25, 2024

@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/47ac7580-02dc-11ef-82c7-51bd1731bc57-0

@RomanBednar
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented Apr 25, 2024

@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2e806bf0-0306-11ef-92be-1329447f7f86-0

@RomanBednar
Copy link
Contributor Author

Rebased to include: #28733

@RomanBednar
Copy link
Contributor Author

/assign @deads2k for approval

Copy link
Contributor

openshift-ci bot commented Apr 25, 2024

@RomanBednar: GitHub didn't allow me to assign the following users: for, approval.

Note that only openshift members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @deads2k for approval

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 97ac991

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-openstack-ovn IncompleteTests
Tests for this run (97) are below the historical average (1407): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-metal-ipi-sdn IncompleteTests
Tests for this run (98) are below the historical average (925): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

1 similar comment
@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 97ac991

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-openstack-ovn IncompleteTests
Tests for this run (97) are below the historical average (1407): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-master-e2e-metal-ipi-sdn IncompleteTests
Tests for this run (98) are below the historical average (925): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

@jsafrane
Copy link
Contributor

periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial failed and the failure looks related:

event happened 25 times, something is wrong: namespace/openshift-cluster-storage-operator deployment/cluster-storage-operator hmsg/79ca95c7bd - reason/OperatorStatusChanged Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods") (15:37:07Z) result=reject }

/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented Apr 26, 2024

@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/76a96420-03a3-11ef-8ce6-881a7444b5f4-0

@RomanBednar
Copy link
Contributor Author

RomanBednar commented Apr 26, 2024

Tried to run snapshot test isolated again and observed operator status changes - the DaemonSet redeploys pods when cloud conf ConfigMap is changed, but that's expected along with status changing once:

onSet to deploy node pods\nVSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods" to "VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods"
33m         Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to act on changes")
33m         Normal    OperatorStatusChanged                            deployment/cluster-storage-operator                       Status for clusteroperator/storage changed: Progressing changed from False to True ("VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods")

EDIT: CSO generating too many condition events is a known bug, currently being investigated - https://issues.redhat.com/browse/OCPBUGS-24061

@RomanBednar
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented Apr 26, 2024

@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/2893b4d0-03d9-11ef-8f9e-a06f21de842c-0

@RomanBednar
Copy link
Contributor Author

cc @dgoodwin @soltysh for approval

@RomanBednar
Copy link
Contributor Author

/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented May 7, 2024

@RomanBednar: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/cdad4690-0c65-11ef-8463-344af23f030f-0

Copy link
Contributor

openshift-ci bot commented May 7, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jsafrane, RomanBednar
Once this PR has been reviewed and has the lgtm label, please ask for approval from deads2k. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

openshift-ci bot commented May 7, 2024

@RomanBednar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-upgrade 15db814 link false /test e2e-aws-ovn-upgrade
ci/prow/e2e-aws-ovn-single-node-upgrade 15db814 link false /test e2e-aws-ovn-single-node-upgrade

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-trt-bot
Copy link

Job Failure Risk Analysis for sha: 15db814

Job Name Failure Risk
pull-ci-openshift-origin-master-e2e-aws-ovn-upgrade High
[bz-Etcd] clusteroperator/etcd should not change condition/Available
This test has passed 100.00% of 88 runs on jobs ['periodic-ci-openshift-release-master-ci-4.17-e2e-aws-ovn-upgrade'] in the last 14 days.

@jsafrane
Copy link
Contributor

openshift/vmware-vsphere-csi-driver-operator#230 has merged

/payload-job periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

Copy link
Contributor

openshift-ci bot commented May 10, 2024

@jsafrane: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.16-e2e-vsphere-ovn-techpreview-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/32e1d770-0ed1-11ef-97f2-5abf97a9ff70-0

Comment on lines +448 to +457
registry.AddPathologicalEventMatcherOrDie(&SimplePathologicalEventMatcher{
name: "StorageOperatorsFlipsProgressingTooOften",
locatorKeyRegexes: map[monitorapi.LocatorKey]*regexp.Regexp{
monitorapi.LocatorNamespaceKey: regexp.MustCompile(`^openshift-cluster-storage-operator$`),
},
messageReasonRegex: regexp.MustCompile(`^OperatorStatusChanged$`),
messageHumanRegex: regexp.MustCompile(`Progressing changed.*`),
jira: "https://issues.redhat.com/browse/OCPBUGS-24061",
})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you drop this commit and run the payload job again now that we have a fix merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. vendor-update Touching vendor dir or related files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants