Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed DHCP Offer packet #532

Open
Alphadelta14 opened this issue May 8, 2024 · 2 comments
Open

Missed DHCP Offer packet #532

Alphadelta14 opened this issue May 8, 2024 · 2 comments

Comments

@Alphadelta14
Copy link

Alphadelta14 commented May 8, 2024

I am trying to track down an issue I experienced with https://github.com/kube-vip/kube-vip where DHCP failed to respond in a timely manner (Specifically: Invoking with --controlplane --ddns and an --address that refers to an fqdn will request a VIP from DHCP using that as a client ID). kube-vip runs in a container with host networking (I have verified that isc-dhclient works fine within that container). The relevant DHCP client usage is in https://github.com/kube-vip/kube-vip/blob/v0.8.0/pkg/vip/dhcp.go. The downstream ticket is kube-vip/kube-vip#844, but I am fairly certain this is a protocol issue.

The not-very-helpful error in kube-vip:

time="2024-05-07T22:58:09Z" level=info msg="waiting for ip from dhcp"
time="2024-05-07T22:58:45Z" level=error msg="request failed, error: got an error while processing the request: no matching response packet received (waiting 10s)"
time="2024-05-07T22:59:30Z" level=error msg="request failed, error: unable to receive an offer: got an error while the discovery request: no matching response packet received (waiting 12.288644945s)"
time="2024-05-07T23:00:18Z" level=error msg="failed to get an IP address after 3 attempts, error unable to receive an offer: got an error while the discovery request: no matching response packet received, giving up"

In this setup, I have been using eth0 as my interface (it had another static IP address, 192.168.96.3, on it too. MAC is 02:42:c0:a8:60:03). My FQDN is kube-api.cluster.internal, so kube-api gets sent as the DHCP client. node02.cluster.internal (192.168.96.4, 02:42:c0:a8:60:04) has a simple isc-dhcp-server on debian bookworm running with this conf:

option domain-name "cluster.internal";
subnet 192.168.96.0 netmask 255.255.255.0 {
 range 192.168.96.150 192.168.96.200;
 option domain-name-servers 8.8.8.8;
}

I have been running both tcpdump and wireshark to see what the packet flows are, and using strace to see what is actually consumed by this library.

Firstly, I have the strace -e network for the client communications:

[pid 1592946] socket(AF_PACKET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, htons(0 /* ETH_P_??? */)) = 8
[pid 1592946] getsockopt(8, SOL_SOCKET, SO_TYPE, [2], [4]) = 0
[pid 1592946] bind(8, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_IP), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
[pid 1592946] getsockname(8, {sa_family=AF_PACKET, sa_data="\10\0\367\1\0\0\1\0\0\6\2B\300\250`\3"}, [112 => 18]) = 0
[pid 1592946] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1, si_uid=0} ---
[pid 1592946] recvfrom(8,  <unfinished ...>
[pid 1592953] sendto(8, "E\0\1H\0\0\0\0@\21y\246\0\0\0\0\377\377\377\377\0D\0C\00147\200\1\1\6\0"..., 328, 0, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_IP), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=6, sll_addr=[0xff, 0xff, 0xff, 0xff, 0xff, 0xff]}, 20 <unfinished ...>
[pid 1592946] <... recvfrom resumed>0xc000892000, 1568, 0, 0xc00064c0e0, [112]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 1592953] <... sendto resumed>)     = 328
[pid 1592946] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1, si_uid=0} ---
[pid 1592950] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1, si_uid=0} ---
[pid 1592950] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1, si_uid=0} ---
[pid 1592955] recvfrom(8, "E\0\0H\377p@\0@\21\351\342\300\250`\1\300\250o\377\341\25\341\25\0004Q\227Spot"..., 1568, 0, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_IP), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_ETHER, sll_pkttype=PACKET_BROADCAST, sll_halen=6, sll_addr=[0x2, 0x42, 0x99, 0xc2, 0x46, 0x7d]}, [112 => 20]) = 72
[pid 1592955] recvfrom(8, 0xc0005a8300, 1568, 0, 0xc0005a23f0, [112]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 1592950] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1, si_uid=0} ---
[pid 1592955] recvfrom(8, "E\0\0001\377\233@\0@\21\351\316\300\250`\1\300\250o\377\277\305~\234\0\35Q\200M-SE"..., 1568, 0, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_IP), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_ETHER, sll_pkttype=PACKET_BROADCAST, sll_halen=6, sll_addr=[0x2, 0x42, 0x99, 0xc2, 0x46, 0x7d]}, [112 => 20]) = 49
[pid 1592955] recvfrom(8, "E\0\0001\377\234@\0@\21\351\315\300\250`\1\300\250o\377\300h~\236\0\35Q\200M-SE"..., 1568, 0, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_IP), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_ETHER, sll_pkttype=PACKET_BROADCAST, sll_halen=6, sll_addr=[0x2, 0x42, 0x99, 0xc2, 0x46, 0x7d]}, [112 => 20]) = 49
[pid 1592955] recvfrom(8, 0xc0005a9100, 1568, 0, 0xc0005a2540, [112]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 1592955] --- SIGURG {si_signo=SIGURG, si_code=SI_TKILL, si_pid=1, si_uid=0} ---
[pid 1592953] sendto(8, "E\0\1H\0\0\0\0@\21y\246\0\0\0\0\377\377\377\377\0D\0C\00147\200\1\1\6\0"..., 328, 0, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_IP), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=6, sll_addr=[0xff, 0xff, 0xff, 0xff, 0xff, 0xff]}, 20) = 328

I was a bit concerned with the recvfrom, but eventually found out that is just some MDNS discovery thing that runs every now and then from 192.168.96.1 (the actual gateway). When i added timestamps, that is actually 4 seconds later.

Here is a tcpdump -i eth0:

13:27:53.902280 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 02:42:c0:a8:60:03 (oui Unknown), length 300
13:27:53.902652 IP node02.cluster.internal.bootps > 192.168.96.150.bootpc: BOOTP/DHCP, Reply, length 300
13:27:56.787072 IP 192.168.96.1.17500 > 192.168.111.255.17500: UDP, length 150
13:27:57.299822 IP 192.168.96.1.mdns > 224.0.0.251.mdns: 0 SRV (QM)? google-nest-hub-xxxxxx._googlecast._tcp.local. (89)
13:27:57.299954 IP 192.168.96.1.mdns > 224.0.0.251.mdns: 0 SRV (QM)? google-nest-hub-xxxxxx._googlecast._tcp.local. (89)
13:27:57.751805 IP 192.168.96.1.52612 > 192.168.111.255.32414: UDP, length 21
13:27:57.751872 IP 192.168.96.1.57455 > 192.168.111.255.32412: UDP, length 21
13:27:58.008321 IP 192.168.96.1.51461 > 239.255.255.250.1900: UDP, length 101
13:27:58.903234 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 02:42:c0:a8:60:03 (oui Unknown), length 300
13:27:58.903629 IP node02.cluster.internal.bootps > 192.168.96.150.bootpc: BOOTP/DHCP, Reply, length 300
13:28:02.753256 IP 192.168.96.1.52612 > 192.168.111.255.32414: UDP, length 21
13:28:02.753257 IP 192.168.96.1.57455 > 192.168.111.255.32412: UDP, length 21
13:28:07.753908 IP 192.168.96.1.57455 > 192.168.111.255.32412: UDP, length 21
13:28:07.753908 IP 192.168.96.1.52612 > 192.168.111.255.32414: UDP, length 21
13:28:08.009455 IP 192.168.96.1.51461 > 239.255.255.250.1900: UDP, length 101
13:28:08.903432 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 02:42:c0:a8:60:03 (oui Unknown), length 300
13:28:08.903758 IP node02.cluster.internal.bootps > 192.168.96.150.bootpc: BOOTP/DHCP, Reply, length 300

Here is an strace from isc-dhclient which didn't get the info on the UDP socket (6), but on the raw socket (5). Note the different clientid (so slightly different IP returned, 192.168.96.155).

socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)) = 5
ioctl(5, SIOCGIFINDEX, {ifr_name="eth0", ifr_ifindex=503}) = 0
bind(5, {sa_family=AF_PACKET, sll_protocol=htons(0 /* ETH_P_??? */), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 6
ioctl(6, SIOCGIFHWADDR, {ifr_name="eth0", ifr_hwaddr={sa_family=ARPHRD_ETHER, sa_data=02:42:c0:a8:60:03}}) = 0
close(6)                                = 0
setsockopt(5, SOL_PACKET, PACKET_AUXDATA, [1], 4) = 0
setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x55e25d81b980}, 16) = 0
getpid()                                = 1809589
sendto(3, "<30>May  8 01:55:54 dhclient[180"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
write(2, "Listening on LPF/eth0/02:42:c0:a"..., 39Listening on LPF/eth0/02:42:c0:a8:60:03) = 39
write(2, "\n", 1
)                       = 1
getpid()                                = 1809589
sendto(3, "<30>May  8 01:55:54 dhclient[180"..., 78, MSG_NOSIGNAL, NULL, 0) = 78
write(2, "Sending on   LPF/eth0/02:42:c0:a"..., 39Sending on   LPF/eth0/02:42:c0:a8:60:03) = 39
write(2, "\n", 1
)                       = 1
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0
socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP) = 6
setsockopt(6, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(6, {sa_family=AF_INET, sin_port=htons(68), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
getpid()                                = 1809589
sendto(3, "<30>May  8 01:55:54 dhclient[180"..., 67, MSG_NOSIGNAL, NULL, 0) = 67
write(2, "Sending on   Socket/fallback", 28Sending on   Socket/fallback) = 28
write(2, "\n", 1
)                       = 1
fcntl(6, F_SETFD, FD_CLOEXEC)           = 0
openat(AT_FDCWD, "/dev/random", O_RDONLY) = 7
newfstatat(7, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x8), ...}, AT_EMPTY_PATH) = 0
ioctl(7, TCGETS, 0x7ffebeefbb20)        = -1 EINVAL (Invalid argument)
read(7, "\227\274\346\244gH\203\267\263E\317\220P\364T\262\220q\4\255\202.\217[\351Q\367\221\350\311\0073"..., 4096) = 4096
close(7)                                = 0
uname({sysname="Linux", nodename="node01", ...}) = 0
getpid()                                = 1809589
sendto(3, "<30>May  8 01:55:54 dhclient[180"..., 97, MSG_NOSIGNAL, NULL, 0) = 97
write(2, "DHCPDISCOVER on eth0 to 255.255."..., 58DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4) = 58
write(2, "\n", 1
)                       = 1
write(5, "\377\377\377\377\377\377\2B\300\250`\3\10\0E\20\1H\0\0\0\0\200\219\226\0\0\0\0\377\377"..., 342) = 342
openat(AT_FDCWD, "/var/run/dhclient.pid", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 7
fcntl(7, F_GETFL)                       = 0x8001 (flags O_WRONLY|O_LARGEFILE)
getpid()                                = 1809589
newfstatat(7, "", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_EMPTY_PATH) = 0
write(7, "1809589\n", 8)                = 8
close(7)                                = 0
pselect6(7, [5 6], [], NULL, {tv_sec=0, tv_nsec=0}, NULL) = 0 (Timeout)
pselect6(7, [5 6], [], NULL, {tv_sec=3, tv_nsec=676837000}, NULL) = 1 (in [5], left {tv_sec=2, tv_nsec=675561433})
recvmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2B\300\250`\3\2B\300\250`\4\10\0E\20\1H\0\0\0\0\200\21\367\244\300\250`\4\300\250"..., iov_len=1536}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_PACKET, cmsg_type=0x8}], msg_controllen=36, msg_flags=0}, 0) = 342
getpid()                                = 1809589
sendto(3, "<30>May  8 01:55:55 dhclient[180"..., 84, MSG_NOSIGNAL, NULL, 0) = 84
write(2, "DHCPOFFER of 192.168.96.155 from"..., 45DHCPOFFER of 192.168.96.155 from 192.168.96.4) = 45
write(2, "\n", 1
)                       = 1
uname({sysname="Linux", nodename="node01", ...}) = 0
getpid()                                = 1809589
sendto(3, "<30>May  8 01:55:55 dhclient[180"..., 104, MSG_NOSIGNAL, NULL, 0) = 104
write(2, "DHCPREQUEST for 192.168.96.155 o"..., 65DHCPREQUEST for 192.168.96.155 on eth0 to 255.255.255.255 port 67) = 65
write(2, "\n", 1
)                       = 1
write(5, "\377\377\377\377\377\377\2B\300\250`\3\10\0E\20\1H\0\0\0\0\200\219\226\0\0\0\0\377\377"..., 342) = 342
pselect6(7, [5 6], [], NULL, {tv_sec=2, tv_nsec=674078000}, NULL) = 1 (in [5], left {tv_sec=2, tv_nsec=671035909})
recvmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\2B\300\250`\3\2B\300\250`\4\10\0E\20\1H\0\0\0\0\200\21\367\244\300\250`\4\300\250"..., iov_len=1536}], msg_iovlen=1, msg_control=[{cmsg_len=36, cmsg_level=SOL_PACKET, cmsg_type=0x8}], msg_controllen=36, msg_flags=0}, 0) = 342

Between my many runs i made sure to clear the IPs on the machine and stop the other dhclient processes.

@pmazzini
Copy link
Collaborator

pmazzini commented May 8, 2024

I am trying to track down an issue I experienced with https://github.com/kube-vip/kube-vip

Have you reported your issue to that project?

@Alphadelta14
Copy link
Author

I am trying to track down an issue I experienced with https://github.com/kube-vip/kube-vip

Have you reported your issue to that project?

I am fairly certain this is a socket/protocol/binding/filtering issue which is why i reported it here first. i have filed downstream to kube-vip/kube-vip#844 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants