Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cilium v1.15.4 network ping fails metallb's LoadBalancer, curl LoadBalancer can pass #32411

Open
2 of 3 tasks
luo964973791 opened this issue May 8, 2024 · 6 comments
Open
2 of 3 tasks
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. need-more-info More information is required to further debug or fix the issue. needs/triage This issue requires triaging to establish severity and next steps.

Comments

@luo964973791
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

[root@node1 metallb]# curl -I 172.27.0.7
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 08 May 2024 02:19:02 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
X-Powered-By: PHP/8.2.7

[root@node1 metallb]# ping 172.27.0.7
PING 172.27.0.7 (172.27.0.7) 56(84) bytes of data.
From 172.27.0.7 icmp_seq=1 Destination Port Unreachable
From 172.27.0.7 icmp_seq=2 Destination Port Unreachable
From 172.27.0.7 icmp_seq=3 Destination Port Unreachable
From 172.27.0.7 icmp_seq=4 Destination Port Unreachable
From 172.27.0.7 icmp_seq=5 Destination Port Unreachable
From 172.27.0.7 icmp_seq=6 Destination Port Unreachable
From 172.27.0.7 icmp_seq=7 Destination Port Unreachable
From 172.27.0.7 icmp_seq=8 Destination Port Unreachable
From 172.27.0.7 icmp_seq=9 Destination Port Unreachable
^C
--- 172.27.0.7 ping statistics ---
9 packets transmitted, 0 received, +9 errors, 100% packet loss, time 8225ms

[root@node1 metallb]# kubectl get svc -n nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-service LoadBalancer 10.233.12.235 172.27.0.7 80:32022/TCP 16h
[root@node1 metallb]#

Cilium Version

v1.15.4

Kernel Version

[root@localhost ~]# cat /etc/redhat-release
Rocky Linux release 9.2 (Blue Onyx)
[root@localhost ~]# uname -r
5.14.0-284.11.1.el9_2.x86_64

Kubernetes Version

kubectl version:v1.29.4

Regression

No response

Sysdump

No response

Relevant log output

[root@node1 metallb]# kubectl get ipaddresspools.metallb.io -A
NAMESPACE        NAME      AUTO ASSIGN   AVOID BUGGY IPS   ADDRESSES
metallb-system   primary   true          false             ["172.27.0.7-172.27.0.9"]
[root@node1 metallb]# kubectl get l2advertisements.metallb.io -A
NAMESPACE        NAME      IPADDRESSPOOLS   IPADDRESSPOOL SELECTORS   INTERFACES
metallb-system   primary   ["primary"]                                
[root@node1 metallb]# kubectl get svc -n nginx
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
nginx-service   LoadBalancer   10.233.12.235   172.27.0.7    80:32022/TCP   16h
[root@node1 metallb]# curl -I 172.27.0.7
HTTP/1.1 200 OK
Server: nginx
Date: Wed, 08 May 2024 02:10:38 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
X-Powered-By: PHP/8.2.7

[root@node1 metallb]# ping 172.27.0.7
PING 172.27.0.7 (172.27.0.7) 56(84) bytes of data.
From 172.27.0.7 icmp_seq=1 Destination Port Unreachable
From 172.27.0.7 icmp_seq=2 Destination Port Unreachable
From 172.27.0.7 icmp_seq=3 Destination Port Unreachable
From 172.27.0.7 icmp_seq=4 Destination Port Unreachable
^C
--- 172.27.0.7 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3099ms

[root@node1 metallb]# crictl images | grep cil
quay.io/cilium/cilium                                 v1.15.4             aebfd554d3483       209MB
quay.io/cilium/operator                               v1.15.4             cf4b9cdd4ba07       36.1MB
[root@node1 metallb]# kubectl exec -it -n kube-system          cilium-mbjx2 /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "cilium-agent" out of: cilium-agent, mount-cgroup (init), apply-sysctl-overwrites (init), clean-cilium-state (init), install-cni-binaries (init)
root@node1:/home/cilium# cilium status
KVStore:                 Ok   etcd: 1/1 connected, leases=1, lock leases=1, has-quorum=true: https://172.27.0.3:2379 - 3.5.12 (Leader)
Kubernetes:              Ok   1.29 (v1.29.4) [linux/amd64]
Kubernetes APIs:         ["EndpointSliceOrEndpoint", "cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumNetworkPolicy", "cilium/v2alpha1::CiliumCIDRGroup", "core/v1::Namespace", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    Partial   [eth0   172.27.0.3 fe80::20c:29ff:fe7b:3822]
Host firewall:           Disabled
SRv6:                    Disabled
CNI Chaining:            none
Cilium:                  Ok   1.15.4 (v1.15.4-9b3f9a8c)
NodeMonitor:             Disabled
Cilium health daemon:    Ok   
IPAM:                    IPv4: 6/254 allocated from 10.233.65.0/24, 
IPv4 BIG TCP:            Disabled
IPv6 BIG TCP:            Disabled
BandwidthManager:        Disabled
Host Routing:            Legacy
Masquerading:            IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status:       45/45 healthy
Proxy Status:            OK, ip 10.233.65.198, 0 redirects active on ports 10000-20000, Envoy: embedded
Global Identity Range:   min 256, max 65535
Hubble:                  Disabled
Encryption:              Disabled        
Cluster health:          2/2 reachable   (2024-05-08T02:11:35Z)
Modules Health:          Stopped(0) Degraded(0) OK(11) Unknown(3)
root@node1:/home/cilium# 
root@node1:/home/cilium# exit;
exit
[root@node1 metallb]# kubectl version
Client Version: v1.29.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4
[root@node1 metallb]# 
[root@node1 metallb]# ping 172.27.0.7
PING 172.27.0.7 (172.27.0.7) 56(84) bytes of data.
From 172.27.0.7 icmp_seq=1 Destination Port Unreachable
From 172.27.0.7 icmp_seq=2 Destination Port Unreachable
From 172.27.0.7 icmp_seq=3 Destination Port Unreachable
From 172.27.0.7 icmp_seq=4 Destination Port Unreachable
From 172.27.0.7 icmp_seq=5 Destination Port Unreachable
From 172.27.0.7 icmp_seq=6 Destination Port Unreachable
From 172.27.0.7 icmp_seq=7 Destination Port Unreachable
^C
--- 172.27.0.7 ping statistics ---
7 packets transmitted, 0 received, +7 errors, 100% packet loss, time 6186ms

# MetalLB deployment
metallb_enabled: true
metallb_speaker_enabled: "{{ metallb_enabled }}"
metallb_namespace: metallb-system
metallb_protocol: "layer2"
metallb_config:
  address_pools:
    primary:
      ip_range:
        - 172.27.0.7-172.27.0.9
      auto_assign: true
  layer2:
    - primary

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@luo964973791 luo964973791 added kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. labels May 8, 2024
@luo964973791
Copy link
Author

[root@node1 metallb]# tcpdump -i any host 172.27.0.7 -s0 -A tcpdump: data link type LINUX_SLL2 dropped privs to tcpdump tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes 10:26:33.131273 lo In IP node1 > node1: ICMP echo request, id 6, seq 1, length 64 E..Te:@.@.}*..................:f....}....................... !"#$%&'()*+,-./01234567 10:26:33.131305 lo In IP node1 > node1: ICMP node1 protocol 1 port 42460 unreachable, length 92 E..pe;[email protected]:@.@.}*..................:f....}....................... !"#$%&'()*+,-./01234567 10:26:34.165170 lo In IP node1 > node1: ICMP echo request, id 6, seq 2, length 64 E..Th&@[email protected]>...........W......:f............................ !"#$%&'()*+,-./01234567 10:26:34.165194 lo In IP node1 > node1: ICMP node1 protocol 1 port 3927 unreachable, length 92 E..ph'[email protected]&@[email protected]>...........W......:f............................ !"#$%&'()*+,-./01234567 10:26:35.194465 lo In IP node1 > node1: ICMP echo request, id 6, seq 3, length 64 E..Tl&@[email protected]>..................:f....X....................... !"#$%&'()*+,-./01234567 10:26:35.194505 lo In IP node1 > node1: ICMP node1 protocol 1 port 51171 unreachable, length 92 E..pl'[email protected]&@[email protected]>..................:f....X....................... !"#$%&'()*+,-./01234567 10:26:36.218668 lo In IP node1 > node1: ICMP echo request, id 6, seq 4, length 64 E..TnS@[email protected].......:f.....U...................... !"#$%&'()*+,-./01234567 10:26:36.218798 lo In IP node1 > node1: ICMP node1 protocol 1 port 12676 unreachable, length 92 [email protected]@[email protected].......:f.....U...................... !"#$%&'()*+,-./01234567

@ArsenyBelorukov
Copy link

Is this the same as #14118 ?

@rauanmayemir
Copy link
Contributor

rauanmayemir commented May 9, 2024

Interesting. I tried this with a regular ClusterIP SVC and ping fails the same while curl works:

PING 172.32.20.33 (172.32.20.33) 56(84) bytes of data.
^C
--- 172.32.20.33 ping statistics ---
58 packets transmitted, 0 received, 100% packet loss, time 58374ms

UPD: Nevermind. TIL kubernetes services can’t be pinged.

@zhangguanzhang
Copy link

Interesting. I tried this with a regular ClusterIP SVC and ping fails the same while curl works:

PING 172.32.20.33 (172.32.20.33) 56(84) bytes of data.
^C
--- 172.32.20.33 ping statistics ---
58 packets transmitted, 0 received, 100% packet loss, time 58374ms

UPD: Nevermind. TIL kubernetes services can’t be pinged.

ping svcIP will not work with kube-proxy ipvs mode
kubernetes/kubernetes#72236
kubernetes/kubernetes#108460

@luo964973791
Copy link
Author

After changing the mode ipvs to iptables, ping also fails.

[root@node1 ~]# ping 172.27.0.7
PING 172.27.0.7 (172.27.0.7) 56(84) bytes of data.
From 172.27.0.3 icmp_seq=2 Redirect Host(New nexthop: 172.27.0.7)
From 172.27.0.3 icmp_seq=3 Redirect Host(New nexthop: 172.27.0.7)
From 172.27.0.3 icmp_seq=4 Redirect Host(New nexthop: 172.27.0.7)
From 172.27.0.3 icmp_seq=5 Redirect Host(New nexthop: 172.27.0.7)
From 172.27.0.3 icmp_seq=6 Redirect Host(New nexthop: 172.27.0.7)
From 172.27.0.3 icmp_seq=8 Redirect Host(New nexthop: 172.27.0.7)
From 172.27.0.3 icmp_seq=11 Redirect Host(New nexthop: 172.27.0.7)
From 172.27.0.3 icmp_seq=9 Destination Host Unreachable
From 172.27.0.3 icmp_seq=13 Destination Host Unreachable
From 172.27.0.3 icmp_seq=14 Destination Host Unreachable
From 172.27.0.3 icmp_seq=15 Destination Host Unreachable

@squeed
Copy link
Contributor

squeed commented May 15, 2024

Is this a self-managed MetalLB installation, or the one installed as part of the Cilium BGP control plane?

Either way, I would not expect pinging service IPs to work, ever.

@squeed squeed added the need-more-info More information is required to further debug or fix the issue. label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. need-more-info More information is required to further debug or fix the issue. needs/triage This issue requires triaging to establish severity and next steps.
Projects
None yet
Development

No branches or pull requests

5 participants