[ovn-controller] Change startup mechanism of ovs pods #423

averdagu · 2025-03-26T15:58:12Z

This commit aims to modify the starting scripts of the ovn-controller-ovs daemonset.

This is done to allow modifying the RollingUpdate Strategy to not allow any Unavailable pod during update, what this will cause is that during an update to the pod (delete old one and create new one) instead of first deleting the first one and then deleting the next one, it will create the new pod while the old one is running. Due to how the startup scritps works currently this is not allowed.

The reason behind this is to try to lower the downtime observed if the environment is using centralized floating ip during an update.

With this commit the ovn-controller-ovs will share the PID namespace with the host, in order to allow signaling between old/new pod.

Another change is adding an STATE that the containers (ovsdb-server, ovs-vswitchd and ovsdb-server-init) will handle internally.

The differents states are:

NULL (No file): will happen the fist time ds is created on the oc worker.
INIT: First time init-ovsdb-server is executed on the oc worker.
OVSDB_SERVER: once ovsdb-server pod has run the startup script
RUNNING: Once ovsdb-server is up and ovs-vswitchd has run the startup script.
UPDATE: Once a new pod is created and ovsdb-server-init has run.
RESTART_VSWITCH: After ovsdb-server-init has finished, new ovsdb-server pod has stopped the old ovs-vswitchd process.
RESTART_DBSERVER: After old ovs-vswitchd has been restarted the old ovsdb-server is also stop.

The normal flow of states is the following:

NULL -> INIT -> OVSDB_SERVER -> RUNNING

Scale down: If the oc worker is deleted the DS and all the pods and mount points will be deleted, in case of node being up again it should start from NULL

Update: RUNNING -> (Change on CR) -> UPDATE -> RESTART_VSWITCHD ->
RESTART_DBSERVER -> OVSDB_SERVER -> RUNNING

Related: OSPRH-11636
Jira: OSPRH-10821
Depends-on: lib-common#611

softwarefactory-project-zuul · 2025-03-26T15:58:20Z

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/ovn-operator for 423,6713e1371b06f42e53b3d588d33c7662d13a1a0c

openshift-ci · 2025-03-26T15:58:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: averdagu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [averdagu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

softwarefactory-project-zuul · 2025-03-26T17:16:07Z

Merge Failed.

This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset.
Warning:
Error merging github.com/openstack-k8s-operators/ovn-operator for 423,3cfcabe999bc5378d959d357c052079452d58bfc

softwarefactory-project-zuul · 2025-03-27T10:54:57Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9d1e6fb949f14b0f902e9c4913239d6e

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 24m 47s
❌ ovn-operator-tempest-multinode FAILURE in 1h 03m 29s

softwarefactory-project-zuul · 2025-04-01T10:29:39Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0a45c0f113754dc8af4e86b200206bc7

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 26m 52s
❌ ovn-operator-tempest-multinode FAILURE in 1h 05m 04s

softwarefactory-project-zuul · 2025-04-02T10:33:20Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/32fe97f422124ae882519d53895dc2c8

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 21m 05s
❌ ovn-operator-tempest-multinode FAILURE in 1h 01m 49s

averdagu · 2025-04-02T12:34:58Z

templates/ovncontroller/bin/init-ovsdb-server.sh

+}
+
+TLSOptions="--certificate=/etc/pki/tls/certs/ovndb.crt --private-key=/etc/pki/tls/private/ovndb.key --ca-cert=/etc/pki/tls/certs/ovndbca.crt"
+DBOptions="--db ssl:ovsdbserver-nb.openstack.svc.cluster.local:6641"


Only supporting ssl, need to add mechanism to support also non tls connections

softwarefactory-project-zuul · 2025-04-02T13:41:24Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/83bd56facf314f1f9e13f3c122caadd9

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 22m 12s
❌ ovn-operator-tempest-multinode FAILURE in 1h 02m 39s

averdagu · 2025-04-04T09:22:29Z

/recheck

softwarefactory-project-zuul · 2025-04-04T17:22:48Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/3ff505fab9994c8ab22909f47cb474a4

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 20m 16s
❌ ovn-operator-tempest-multinode FAILURE in 59m 38s

softwarefactory-project-zuul · 2025-04-07T14:33:34Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/107eac5bca244667b5651800bb406375

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 15m 47s
❌ ovn-operator-tempest-multinode FAILURE in 58m 32s

softwarefactory-project-zuul · 2025-04-09T09:40:50Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/9c9fa9ecc3e44aebb4d808d86e48cdf0

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 20m 03s
❌ ovn-operator-tempest-multinode FAILURE in 59m 55s

This commit aims to modify the starting scripts of the ovn-controller-ovs daemonset. This is done to allow modifying the RollingUpdate Strategy to not allow any Unavailable pod during update, what this will cause is that during an update to the pod (delete old one and create new one) instead of first deleting the first one and then deleting the next one, it will create the new pod while the old one is running. Due to how the startup scritps works currently this is not allowed. The reason behind this is to try to lower the downtime observed if the environment is using centralized floating ip during an update. With this commit the ovn-controller-ovs will share the PID namespace with the host, in order to allow signaling between old/new pod. Another change is adding an STATE that the containers (ovsdb-server, ovs-vswitchd and ovsdb-server-init) will handle internally. The differents states are: - NULL (No file): will happen the fist time ds is created on the oc worker. - INIT: First time init-ovsdb-server is executed on the oc worker. - OVSDB_SERVER: once ovsdb-server pod has run the startup script - RUNNING: Once ovsdb-server is up and ovs-vswitchd has run the startup script. - UPDATE: Once a new pod is created and ovsdb-server-init has run. - RESTART_VSWITCH: After ovsdb-server-init has finished, new ovsdb-server pod has stopped the old ovs-vswitchd process. - RESTART_DBSERVER: After old ovs-vswitchd has been restarted the old ovsdb-server is also stop. The normal flow of states is the following: NULL -> INIT -> OVSDB_SERVER -> RUNNING Scale down: If the oc worker is deleted the DS and all the pods and mount points will be deleted, in case of node being up again it should start from NULL Update: RUNNING -> (Change on CR) -> UPDATE -> RESTART_VSWITCHD -> RESTART_DBSERVER -> OVSDB_SERVER -> RUNNING Jira: OSPRH-11636 Related: OSPRH-10821

softwarefactory-project-zuul · 2025-04-09T11:27:16Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/fada0b90762848388e290079e5af88e4

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 21m 43s
❌ ovn-operator-tempest-multinode FAILURE in 1h 01m 01s

openshift-ci · 2025-04-09T12:07:06Z

@averdagu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ovn-operator-build-deploy-kuttl	`2cda39c`	link	true	`/test ovn-operator-build-deploy-kuttl`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from olliewalsh and stuggi March 26, 2025 15:58

openshift-merge-robot added the needs-rebase label Mar 26, 2025

openshift-ci bot added the approved label Mar 26, 2025

averdagu force-pushed the ovs-restart branch from 6713e13 to 3cfcabe Compare March 26, 2025 17:16

averdagu force-pushed the ovs-restart branch from 3cfcabe to b886060 Compare March 27, 2025 09:28

openshift-merge-robot removed the needs-rebase label Mar 27, 2025

openshift-merge-robot added the needs-rebase label Apr 1, 2025

averdagu force-pushed the ovs-restart branch from b886060 to f6d9ff2 Compare April 1, 2025 09:01

openshift-merge-robot removed the needs-rebase label Apr 1, 2025

averdagu force-pushed the ovs-restart branch from f6d9ff2 to e619a45 Compare April 2, 2025 09:10

averdagu force-pushed the ovs-restart branch 2 times, most recently from b94efec to f304f7c Compare April 2, 2025 12:17

averdagu commented Apr 2, 2025

View reviewed changes

averdagu force-pushed the ovs-restart branch from f304f7c to 68733f1 Compare April 4, 2025 16:01

averdagu force-pushed the ovs-restart branch from 68733f1 to 1954677 Compare April 7, 2025 13:16

averdagu force-pushed the ovs-restart branch 2 times, most recently from 1a1a179 to 30fec14 Compare April 9, 2025 08:19

averdagu force-pushed the ovs-restart branch from 30fec14 to 2cda39c Compare April 9, 2025 10:04

averdagu mentioned this pull request Apr 24, 2025

Modify startup scripts for ovn-controller-ovs #434

Open

2 tasks

averdagu marked this pull request as draft April 25, 2025 13:28

openshift-ci bot added the do-not-merge/work-in-progress label Apr 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ovn-controller] Change startup mechanism of ovs pods #423

[ovn-controller] Change startup mechanism of ovs pods #423

averdagu commented Mar 26, 2025 •

edited by openshift-ci bot

Loading

softwarefactory-project-zuul bot commented Mar 26, 2025

openshift-ci bot commented Mar 26, 2025

softwarefactory-project-zuul bot commented Mar 26, 2025

softwarefactory-project-zuul bot commented Mar 27, 2025

softwarefactory-project-zuul bot commented Apr 1, 2025

softwarefactory-project-zuul bot commented Apr 2, 2025

averdagu Apr 2, 2025

softwarefactory-project-zuul bot commented Apr 2, 2025

averdagu commented Apr 4, 2025

softwarefactory-project-zuul bot commented Apr 4, 2025

softwarefactory-project-zuul bot commented Apr 7, 2025

softwarefactory-project-zuul bot commented Apr 9, 2025

softwarefactory-project-zuul bot commented Apr 9, 2025

openshift-ci bot commented Apr 9, 2025

[ovn-controller] Change startup mechanism of ovs pods #423

Are you sure you want to change the base?

[ovn-controller] Change startup mechanism of ovs pods #423

Conversation

averdagu commented Mar 26, 2025 • edited by openshift-ci bot Loading

softwarefactory-project-zuul bot commented Mar 26, 2025

openshift-ci bot commented Mar 26, 2025

softwarefactory-project-zuul bot commented Mar 26, 2025

softwarefactory-project-zuul bot commented Mar 27, 2025

softwarefactory-project-zuul bot commented Apr 1, 2025

softwarefactory-project-zuul bot commented Apr 2, 2025

averdagu Apr 2, 2025

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Apr 2, 2025

averdagu commented Apr 4, 2025

softwarefactory-project-zuul bot commented Apr 4, 2025

softwarefactory-project-zuul bot commented Apr 7, 2025

softwarefactory-project-zuul bot commented Apr 9, 2025

softwarefactory-project-zuul bot commented Apr 9, 2025

openshift-ci bot commented Apr 9, 2025

averdagu commented Mar 26, 2025 •

edited by openshift-ci bot

Loading