You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I use calico cni with bird backend and "peer with router" architecture.
Unfortunately my provider hardware network bandwidth is pretty limited so to avoid throttle by network qos, I need to use VXLAN full mesh tunnels to route kubernetes internal traffic on l2 network without bounce on my gateway.
Finally, I have calico-node with bgp peer with router (for loadbalancing with help of metallb and cross projects/regions routing) and vxlan full mesh tunnels for internal communications.
It works pretty well, the issue is when i re-run kubspray role network calico task "Configure node asNumber for per node peering" execute a calicoctl apply node.projectcalico.org/v3.
calicoctl apply replace resource if exists:
However, resources node.projectcalico.org/v3 contains informations feed by calico-node when ippool specify vxlan:
pod workload ippool:
with this configuration calico will add an interface vxlan.calico with ip in 10.3.240.0/20 range to create vxlan tunnels for example on master-0:
this ip is add to node.projectcalico.org/v3 master-0:
which permit calico-node on other nodes to create proper vxlan for corresponding podCIDRs allocated to this node:
Now you could understand that when i re-run kubespray task all node.projectcalico.org/v3 fields ipv4VXLANTunnelAddr are removed, which trigger calico-node to remove vxlan tunnel, all my network traffic is also reroute through bgp gateway which is under qos and I encounter network bandwidth issues.
To fix this I currenty have to restart calico-node pods but time during all vxlan tunnel are up my cluster is under pressure, and i loss a lot of paquets due to openstack qos.
What did you expect to happen?
I expect that kubespray don't use calicoctl apply on already existing resources but calicoctl patch instead to avoid erased of fields not managed by kubespray.
In this way vxlan tunnel should'nt be removed.
NB: if I remove task I haven't trouble, but obviously I need this task to create new nodes.
How can we reproduce it (as minimally and precisely as possible)?
Create a k8s cluster with calico cni and vxlan fullmesh and bgp peer with router. re-run kubespray network tags and observe your vxlan tunnels drop.
My idea to troubleshoot this behavior is to add task to get node then if node exist do a calicoctl patch and if node doesn't exist do a calicoctl apply, it seem to work I finish to setup these tasks and offer a PR regarding this case.
The text was updated successfully, but these errors were encountered:
What happened?
Hello,
I use calico cni with bird backend and "peer with router" architecture.
Unfortunately my provider hardware network bandwidth is pretty limited so to avoid throttle by network qos, I need to use VXLAN full mesh tunnels to route kubernetes internal traffic on l2 network without bounce on my gateway.
Finally, I have calico-node with bgp peer with router (for loadbalancing with help of metallb and cross projects/regions routing) and vxlan full mesh tunnels for internal communications.
It works pretty well, the issue is when i re-run kubspray role network calico task "Configure node asNumber for per node peering" execute a
calicoctl apply node.projectcalico.org/v3
.calicoctl apply replace resource if exists:
However, resources
node.projectcalico.org/v3
contains informations feed by calico-node when ippool specify vxlan:pod workload ippool:
vxlan tunnel ippool
with this configuration calico will add an interface vxlan.calico with ip in
10.3.240.0/20
range to create vxlan tunnels for example on master-0:this ip is add to
node.projectcalico.org/v3
master-0:which permit calico-node on other nodes to create proper vxlan for corresponding podCIDRs allocated to this node:
Now you could understand that when i re-run kubespray task all
node.projectcalico.org/v3
fieldsipv4VXLANTunnelAddr
are removed, which trigger calico-node to remove vxlan tunnel, all my network traffic is also reroute through bgp gateway which is under qos and I encounter network bandwidth issues.To fix this I currenty have to restart calico-node pods but time during all vxlan tunnel are up my cluster is under pressure, and i loss a lot of paquets due to openstack qos.
What did you expect to happen?
I expect that kubespray don't use
calicoctl apply
on already existing resources butcalicoctl patch
instead to avoid erased of fields not managed by kubespray.In this way vxlan tunnel should'nt be removed.
NB: if I remove task I haven't trouble, but obviously I need this task to create new nodes.
How can we reproduce it (as minimally and precisely as possible)?
Create a k8s cluster with calico cni and vxlan fullmesh and bgp peer with router. re-run kubespray network tags and observe your vxlan tunnels drop.
OS
Version of Ansible
Version of Python
Python 3.11.8
Version of Kubespray (commit)
1b870a1
Network plugin used
calico
Full inventory with variables
dynamic inventory too much inventory variables but not interesting in this case
Command used to invoke ansible
ansible-playbook -i openstack.yaml kubespray/cluster.yml -b --tags network
Output of ansible run
No errors.
Anything else we need to know
My idea to troubleshoot this behavior is to add task to get node then if node exist do a
calicoctl patch
and if node doesn't exist do acalicoctl apply
, it seem to work I finish to setup these tasks and offer a PR regarding this case.The text was updated successfully, but these errors were encountered: