-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Traffic initiated from the gateway(routers/switches) may got lead to wrong nodes in layer2 mode #359
Comments
Which annotation are you using? |
As the code shows, openelb-manager tries to select one of the ready nodes and annotate the service with an annotation called OpenELBLayer2Annotation, which is reused later unless user changes the value. openelb/pkg/controllers/lb/controller.go Line 218 in 0672f10
In my case, I never changed the annotation manually. The node is selected by openelb-manger and the annotation is added by openelb-manager, too. |
After communicating with some switch fellows, It seems reasonable that the switch uses the original port of the arp reply as the target port in arp table, even if the mac address within the arp reply does not match with the port. But why layer2 mode works fine with many other switches? This is confusing. |
Combined with how openelb-manager/arp/switches work, the key to this problem is visiting lb services from the gateway when gratious arp has been outdated. I have find out and updated the procedure to reproduce this problem. |
Also ,my colleague has offered a possible solution to this problem, #360 |
Describe the bug
After creating a lb service and running the reconcile function of the service, we observe that the arp table in the switch is right and pinging lb ip works fine.
But after 20mins' free of using, the arp info within the arp table is outdated.
If we try to ping the lb ip again, we could find that openelb-manager pod sends arp reply of the right mac addr, but the arp info within the switch shows that :
it uses the node where openelb pod is running as the port to switch packet to, rather than the node annotated in the service.
For example, the arp info is as follow. The mac address xxxx-1b3d-832c is the right one configured, but the BAGGG21 port of the switch is connected to another node, which is running openelb-manager, but not the right switch port matching the mac address.
172.31.11.2 xxxx-1b3d-832c 2002 BAGG21 362 D
To Reproduce
After some digging in this problem, also learned more about arp protocal, finally I understand why this probelm occured.
To reproduce this bug:
layer2.openelb.kubesphere.io/v1alpha1
, assign a node other than the node where openelb-manager leader is running on.For example, openelb-manager leader is running on node master1, then , we should edit the lb service and make sure the value of annotationlayer2.openelb.kubesphere.io/v1alpha1
is not master1.ping 172.31.11.2
Expected behaviour
Output
Version Info
The text was updated successfully, but these errors were encountered: