Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix calico vxlan tunnel resilience on ansible run #11097

Conversation

MatthieuFin
Copy link
Contributor

What type of PR is this?
/kind bug

What this PR does / why we need it:
When I run kubespray on existing cluster with calico cni, bird backend and vxlan tunnels, vxlan tunnel are dropped because calicoctl apply

Which issue(s) this PR fixes:

Fixes #11096

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 18, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @MatthieuFin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 18, 2024
@yankay
Copy link
Member

yankay commented Apr 19, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 19, 2024
if node.projectcalico.org already existe patch node to set asNumber
instead of apply resource to prevent remove of existing fields feed by
calico-node pods

✅ Closes: 11096
Copy link
Contributor

@cyclinder cyclinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MatthieuFin for the report and fix! Look at #11096, It seems when you re-run your calico tasks, the IP address of vxlan_calico in the node object was discarded, it causes your vxlan network is broken, right?

why did you consider run these tasks? These tasks are only for a fresh cluster(please let me know if I'm wrong), If you want to update the calico config, it would be better to run calicoctl or kubectl without running kubespray tasks.

Of course, if we can re-run these tasks without destroying the existing cluster, that's pretty nice too!

@MatthieuFin
Copy link
Contributor Author

Hello, I have the habit of managing the deployment of calico with kubespray.
I suffered 2 outages due to this issue, the first was to change the requested resources of calico deployment and the 2nd was to upgrade calico.

Run kubespray to upgrade calico and kubespray version permit to manage rbac deployment per example, especially split of rbac with introduction of clusterrole "calico-cni-plugin" with calico version 3.26 in that case.

I wanna prevent the case where someone run kubespray tags network and broken my vxlan network.

The goal of this PR is to ensure that task is idempotent.

@cyclinder
Copy link
Contributor

@MatthieuFin thanks for the details, the changes look good to me, Are you testing your changes?

@MatthieuFin
Copy link
Contributor Author

Hi, yes I tested the changes, that's the workaround that I use on my production clusters.
I have also tested them on new fresh cluster creation and they seem fully backward compatible.

Copy link
Contributor

@cyclinder cyclinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work! Now LGTM.

/lgtm
/cc @yankay

@k8s-ci-robot k8s-ci-robot requested a review from yankay May 1, 2024 14:26
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 1, 2024
@yankay
Copy link
Member

yankay commented May 6, 2024

Thanks @MatthieuFin
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cyclinder, MatthieuFin, yankay

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 6, 2024
@k8s-ci-robot k8s-ci-robot merged commit a01d0c0 into kubernetes-sigs:master May 6, 2024
60 checks passed
@MatthieuFin MatthieuFin deleted the calico-node-resources-upadte branch May 6, 2024 08:58
dabeck pushed a commit to fino-digital/kubespray that referenced this pull request May 7, 2024
…11097)

if node.projectcalico.org already existe patch node to set asNumber
instead of apply resource to prevent remove of existing fields feed by
calico-node pods

✅ Closes: 11096
pedro-peter pushed a commit to pedro-peter/kubespray that referenced this pull request May 8, 2024
…11097)

if node.projectcalico.org already existe patch node to set asNumber
instead of apply resource to prevent remove of existing fields feed by
calico-node pods

✅ Closes: 11096
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VXLAN tunnel dropped with bird bgp calico network backend
4 participants