Skip to content
This repository has been archived by the owner on Aug 22, 2022. It is now read-only.

[pod-gateway] helm chart doesn't work with cilium #1633

Open
legalgig opened this issue Jun 25, 2022 · 12 comments
Open

[pod-gateway] helm chart doesn't work with cilium #1633

legalgig opened this issue Jun 25, 2022 · 12 comments
Labels

Comments

@legalgig
Copy link

Helm chart name

pod-gateway

Helm chart version

5.4.2

Container name

ghcr.io/k8s-at-home/pod-gateway

Container tag

v1.2.6

Description

Hi,

I've been using pod-gateway for quite some time and it was working perfectly fine with calico but recently I've decided to migrate whole cluster from calico to cilium. After migration pod-gateway stopped working. It seems that there is no connection over vxlan interface. I've already tried different vxlan id, and removing NOT_ROUTED_TO_GATEWAY_CIDRS and/or VPN_LOCAL_CIDRS

Kuberentes:

  • k3s v1.23.6+k3s1
  • 3 node cluster
  • disabled flannel
  • cilium 1.11.6
  • ubuntu 20.04.4 LTS

Here are some logs

pod-gateway container

+ cat /default_config/settings.sh
#!/bin/sh
# hostname of the gateway - it must accept vxlan and DHCP traffic
# clients get it as env variable
GATEWAY_NAME="${gateway}"
# K8S DNS IP address
# clients get it as env variable
K8S_DNS_IPS="${K8S_DNS_ips}"
# Blank  sepated IPs not sent to the POD gateway but to the default K8S
# This is needed, for example, in case your CNI does
# not add a non-default rule for the K8S addresses (Flannel does)
NOT_ROUTED_TO_GATEWAY_CIDRS=""

# Vxlan ID to use
VXLAN_ID="42"
# VXLAN need an /24 IP range not conflicting with K8S and local IP ranges
VXLAN_IP_NETWORK="172.16.0"
# Keep a range of IPs for static assignment in nat.conf
VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20

# If using a VPN, interface name created by it
VPN_INTERFACE=tun0
# Prevent non VPN traffic to leave the gateway
VPN_BLOCK_OTHER_TRAFFIC=true
# If VPN_BLOCK_OTHER_TRAFFIC is true, allow VPN traffic over this port
VPN_TRAFFIC_PORT=443
# Traffic to these IPs will be send through the K8S gateway
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"

# DNS queries to these domains will be resolved by K8S DNS instead of
# the default (typcally the VPN client changes it)
+ . /default_config/settings.sh
+ GATEWAY_NAME=
+ K8S_DNS_IPS=
+ NOT_ROUTED_TO_GATEWAY_CIDRS=
+ VXLAN_ID=42
+ VXLAN_IP_NETWORK=172.16.0
+ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
+ VPN_INTERFACE=tun0
+ VPN_BLOCK_OTHER_TRAFFIC=true
+ VPN_TRAFFIC_PORT=443
+ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
+ DNS_LOCAL_CIDRS=local
+ cat /config/settings.sh
DNS_LOCAL_CIDRS="local"#!/bin/sh
# Generated by pod-gateway
DNS_LOCAL_CIDRS="local"
NOT_ROUTED_TO_GATEWAY_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_BLOCK_OTHER_TRAFFIC="true"
VPN_INTERFACE="wg0"
VPN_LOCAL_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_TRAFFIC_PORT="51820"
VXLAN_GATEWAY_FIRST_DYNAMIC_IP="20"
VXLAN_ID="42"
VXLAN_IP_NETWORK="172.16.0"
+ . /config/settings.sh
+ DNS_LOCAL_CIDRS=local
+ NOT_ROUTED_TO_GATEWAY_CIDRS='192.168.0.0/16 10.0.0.0/8'
+ VPN_BLOCK_OTHER_TRAFFIC=true
+ VPN_INTERFACE=wg0
+ VPN_LOCAL_CIDRS='192.168.0.0/16 10.0.0.0/8'
+ VPN_TRAFFIC_PORT=51820
+ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
+ VXLAN_ID=42
+ VXLAN_IP_NETWORK=172.16.0
+ grep nameserver /etc/resolv.conf
+ cut '-d ' -f2
+ K8S_DNS=10.43.0.10
+ echo '

# DHCP server settings
interface=vxlan0
bind-interfaces

# Dynamic IPs assigned to PODs - we keep a range for static IPs
dhcp-range=172.16.0.20,172.16.0.255,12h

# For debugging purposes, log each DNS query as it passes through
# dnsmasq.
log-queries

# Log lots of extra information about DHCP transactions.
log-dhcp

# Log to stdout
log-facility=-
'
+ echo '
# Send local DNS queries to the K8S DNS server
server=/local/10.43.0.10
'
+ sleep 10
+ exec dnsmasq -k
dnsmasq[1]: started, version 2.85 cachesize 150
dnsmasq[1]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-cryptohash no-DNSSEC loop-detect inotify dumpfile
dnsmasq-dhcp[1]: DHCP, IP range 172.16.0.20 -- 172.16.0.255, lease time 12h
dnsmasq-dhcp[1]: DHCP, sockets bound exclusively to interface vxlan0
dnsmasq[1]: using nameserver 10.43.0.10#53 for domain local
dnsmasq[1]: reading /etc/resolv.conf
dnsmasq[1]: using nameserver 10.43.0.10#53 for domain local
dnsmasq[1]: using nameserver 193.138.218.74#53
dnsmasq[1]: read /etc/hosts - 7 addresses

app gateway init logs

+ cat /default_config/settings.sh
#!/bin/sh
# hostname of the gateway - it must accept vxlan and DHCP traffic
# clients get it as env variable
GATEWAY_NAME="${gateway}"
# K8S DNS IP address
# clients get it as env variable
K8S_DNS_IPS="${K8S_DNS_ips}"
# Blank  sepated IPs not sent to the POD gateway but to the default K8S
# This is needed, for example, in case your CNI does
# not add a non-default rule for the K8S addresses (Flannel does)
NOT_ROUTED_TO_GATEWAY_CIDRS=""

# Vxlan ID to use
VXLAN_ID="42"
# VXLAN need an /24 IP range not conflicting with K8S and local IP ranges
VXLAN_IP_NETWORK="172.16.0"
# Keep a range of IPs for static assignment in nat.conf
VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20

# If using a VPN, interface name created by it
VPN_INTERFACE=tun0
# Prevent non VPN traffic to leave the gateway
VPN_BLOCK_OTHER_TRAFFIC=true
# If VPN_BLOCK_OTHER_TRAFFIC is true, allow VPN traffic over this port
VPN_TRAFFIC_PORT=443
# Traffic to these IPs will be send through the K8S gateway
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"

# DNS queries to these domains will be resolved by K8S DNS instead of
# the default (typcally the VPN client changes it)
+ . /default_config/settings.sh
+ GATEWAY_NAME=pod-gateway.vpn-gateway.svc.cluster.local
+ K8S_DNS_IPS=10.43.0.10
+ NOT_ROUTED_TO_GATEWAY_CIDRS=
+ VXLAN_ID=42
+ VXLAN_IP_NETWORK=172.16.0
+ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
+ VPN_INTERFACE=tun0
+ VPN_BLOCK_OTHER_TRAFFIC=true
+ VPN_TRAFFIC_PORT=443
+ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
+ DNS_LOCAL_CIDRS=local
+ cat /config/settings.sh
DNS_LOCAL_CIDRS="local"#!/bin/sh
# Generated by pod-gateway
DNS_LOCAL_CIDRS="local"
NOT_ROUTED_TO_GATEWAY_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_BLOCK_OTHER_TRAFFIC="true"
VPN_INTERFACE="wg0"
VPN_LOCAL_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_TRAFFIC_PORT="51820"
VXLAN_GATEWAY_FIRST_DYNAMIC_IP="20"
VXLAN_ID="42"
VXLAN_IP_NETWORK="172.16.0"
+ . /config/settings.sh
+ DNS_LOCAL_CIDRS=local
+ NOT_ROUTED_TO_GATEWAY_CIDRS='192.168.0.0/16 10.0.0.0/8'
+ VPN_BLOCK_OTHER_TRAFFIC=true
+ VPN_INTERFACE=wg0
+ VPN_LOCAL_CIDRS='192.168.0.0/16 10.0.0.0/8'
+ VPN_TRAFFIC_PORT=51820
+ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
+ VXLAN_ID=42
+ VXLAN_IP_NETWORK=172.16.0
+ ip addr
+ grep -q vxlan0
+ /sbin/ip route
+ awk '/default/ { print $3 }'
+ K8S_GW_IP=10.69.1.233
+ ip route add 192.168.0.0/16 via 10.69.1.233
+ ip route add 10.0.0.0/8 via 10.69.1.233
+ echo 'Deleting existing default GWs'
+ ip route del 0/0
Deleting existing default GWs
+ ping -c 1 -W 1000 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Network unreachable
+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
1716: eth0@if1717: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c2:c8:25:c7:71:4d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.69.1.188/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::c0c8:25ff:fec7:714d/64 scope link tentative
       valid_lft forever preferred_lft forever
10.0.0.0/8 via 10.69.1.233 dev eth0
10.69.1.233 dev eth0 scope link
192.168.0.0/16 via 10.69.1.233 dev eth0
+ ip route
+ echo 10.43.0.10
+ cut -d ' ' -f 1
+ K8S_DNS_IP=10.43.0.10
+ dig +short pod-gateway.vpn-gateway.svc.cluster.local @10.43.0.10
+ GATEWAY_IP=10.69.1.62
+ hostname
+ grep test-b5f557b48-jr2k4 /config/nat.conf
+ true
+ NAT_ENTRY=
+ VXLAN_GATEWAY_IP=172.16.0.1
+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
+ ip route
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
1716: eth0@if1717: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c2:c8:25:c7:71:4d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.69.1.188/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::c0c8:25ff:fec7:714d/64 scope link tentative
       valid_lft forever preferred_lft forever
10.0.0.0/8 via 10.69.1.233 dev eth0
10.69.1.233 dev eth0 scope link
192.168.0.0/16 via 10.69.1.233 dev eth0
+ ping -c1 10.69.1.62
PING 10.69.1.62 (10.69.1.62): 56 data bytes
64 bytes from 10.69.1.62: seq=0 ttl=63 time=0.073 ms

--- 10.69.1.62 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.073/0.073/0.073 ms
+ ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
+ bridge fdb append to 00:00:00:00:00:00 dst 10.69.1.62 dev vxlan0
+ ip link set up dev vxlan0
+ echo 'backoff-cutoff 2;
Get dynamic IP
initial-interval 1;
link-timeout 10;
reboot 0;
retry 10;
select-timeout 0;
timeout 30;

interface "vxlan0"
 {
  request subnet-mask,
          broadcast-address,
          routers,
          #domain-name-servers;
  require routers,
          subnet-mask,
          #domain-name-servers;
 }
'
+ '[' -z  ]
+ echo 'Get dynamic IP'
+ dhclient -v -cf /etc/dhclient.conf vxlan0
Internet Systems Consortium DHCP Client 4.4.2
Copyright 2004-2020 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

/etc/dhclient.conf line 3: semicolon expected.
link-timeout 10;
              ^
/etc/dhclient.conf line 4: semicolon expected.
reboot
 ^
/etc/dhclient.conf line 15: no option named require in space dhcp
  require routers,
   ^
/etc/dhclient.conf line 18:  : expected option name.
 }
  ^
/etc/dhclient.conf line 18: unterminated interface declaration.

^
Listening on LPF/vxlan0/be:b3:8a:b5:43:7b
Sending on   LPF/vxlan0/be:b3:8a:b5:43:7b
Sending on   Socket/fallback
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether be:b3:8a:b5:43:7b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::bcb3:8aff:feb5:437b/64 scope link
       valid_lft forever preferred_lft forever
+ ip route
1716: eth0@if1717: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c2:c8:25:c7:71:4d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.69.1.188/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::c0c8:25ff:fec7:714d/64 scope link
       valid_lft forever preferred_lft forever
10.0.0.0/8 via 10.69.1.233 dev eth0
10.69.1.233 dev eth0 scope link
+ ping -c1 172.16.0.1
192.168.0.0/16 via 10.69.1.233 dev eth0
PING 172.16.0.1 (172.16.0.1): 56 data bytes
ping: sendto: Network unreachable

cilium values

cilium:
  ipam:
    operator: 
      clusterPoolIPv4PodCIDRList:
      - "10.69.0.0/16"

Expected result

fix integration with cilium cni

Helm values to reproduce

pod-gateway:  
  settings:
    VPN_INTERFACE: wg0
    VPN_TRAFFIC_PORT: 51820
    VPN_BLOCK_OTHER_TRAFFIC: true
    NOT_ROUTED_TO_GATEWAY_CIDRS: "192.168.0.0/16 10.0.0.0/8"
    VPN_LOCAL_CIDRS: "192.168.0.0/16 10.0.0.0/8"
    VXLAN_IP_NETWORK: "172.16.0"
  routed_namespaces:
  - vpn
  addons:
    vpn:
      enabled: true
      type: wireguard
      configFileSecret: wg-profile
      networkPolicy:
        enabled: false

Additional Information

No response

Repo link

No response

@legalgig legalgig added the bug label Jun 25, 2022
@angelnu
Copy link
Contributor

angelnu commented Jun 26, 2022

the ping to 10.69.1.62 which should be the gateway pod seems to be working. So at least that there is a rule in cilium blocking the vxlan traffic port (4789) then that should work.

I would suggest to debug 2 things:

  • you have errors in the dhcp client config - there was a bug fixed not long ago.
  • try with hardcoded IPs (see doc) as this will skip the DHCP part and allow to manually test if the ping with the vlxan IP works.

@legalgig
Copy link
Author

so there are things I've tried/tested:

  1. I've updated images to newest versions (pod-gateway v1.6.0 and webhook to v3.5.0) It did fix almost all issues with dhcp except one
/etc/dhclient.conf line 3: semicolon expected.
link-timeout 10;
  1. I've build my own image with link-timeout 10; removed but it didn't fixed
  2. I didn't find any instructions how to set ip addresses manually so I've build my own container image with sleep inside IF so I can debug things
if [[ -z "$NAT_ENTRY" ]]; then
  echo "Get dynamic IP"
  dhclient -v -cf /etc/dhclient.conf vxlan0
  sleep 9000
else
  IP=$(cut -d' ' -f2 <<< "$NAT_ENTRY")
  VXLAN_IP="${VXLAN_IP_NETWORK}.${IP}"
  echo "Use fixed IP $VXLAN_IP"
  ip addr add "${VXLAN_IP}/24" dev vxlan0
  route add default gw "$VXLAN_GATEWAY_IP"
fi
  1. I've exec into gateway-init container
  2. I've added ip address and route manualy but it didn't ping anyway
bash-5.1# ip addr add "172.16.0.20/24" dev vxlan0
bash-5.1# route add default gw "172.16.0.1"
bash-5.1# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 1a:ad:f9:a7:92:77 brd ff:ff:ff:ff:ff:ff
    inet 172.16.0.20/24 scope global vxlan0
       valid_lft forever preferred_lft forever
    inet6 fe80::18ad:f9ff:fea7:9277/64 scope link
       valid_lft forever preferred_lft forever
3386: eth0@if3387: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:2e:f8:66:dd:a4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.69.1.196/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42e:f8ff:fe66:dda4/64 scope link
       valid_lft forever preferred_lft forever
bash-5.1# ip route
default via 172.16.0.1 dev vxlan0
10.0.0.0/8 via 10.69.1.233 dev eth0
10.69.1.233 dev eth0 scope link
172.16.0.0/24 dev vxlan0 proto kernel scope link src 172.16.0.20
192.168.0.0/16 via 10.69.1.233 dev eth0
bash-5.1# ping 172.16.0.1
PING 172.16.0.1 (172.16.0.1): 56 data bytes
^C
--- 172.16.0.1 ping statistics ---
12 packets transmitted, 0 packets received, 100% packet loss
bash-5.1#

Here are some additional logs from test pod

+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3386: eth0@if3387: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:2e:f8:66:dd:a4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.69.1.196/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42e:f8ff:fe66:dda4/64 scope link tentative
       valid_lft forever preferred_lft forever
+ ip route
10.0.0.0/8 via 10.69.1.233 dev eth0
10.69.1.233 dev eth0 scope link
192.168.0.0/16 via 10.69.1.233 dev eth0
+ ping -c1 10.69.1.26
PING 10.69.1.26 (10.69.1.26): 56 data bytes
64 bytes from 10.69.1.26: seq=0 ttl=63 time=0.097 ms

--- 10.69.1.26 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.097/0.097/0.097 ms
+ ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
+ bridge fdb append to 00:00:00:00:00:00 dst 10.69.1.26 dev vxlan0
+ ip link set up dev vxlan0
+ cat
+ [[ -z '' ]]
+ echo 'Get dynamic IP'
+ dhclient -v -cf /etc/dhclient.conf vxlan0
Get dynamic IP
Internet Systems Consortium DHCP Client 4.4.3
Copyright 2004-2022 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/vxlan0/1a:ad:f9:a7:92:77
Sending on   LPF/vxlan0/1a:ad:f9:a7:92:77
Sending on   Socket/fallback
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
+ sleep 9000

@angelnu
Copy link
Contributor

angelnu commented Jun 26, 2022

Could you please also post the logs from the gateway pod init container? This is the one setting up the ip, routes and iptables of the gateway.

What is your policy enforcement? https://docs.cilium.io/en/v1.9/policy/intro/ I expect the problem being the UDP traffic not going through...

@legalgig
Copy link
Author

I use default policy enforcement mode

Here are all logs from gateway-init container from my test pod

╰─$ k logs -n vpn test-6bc5585d5b-92psv gateway-init
+ cat /default_config/settings.sh
#!/bin/bash

# hostname of the gateway - it must accept vxlan and DHCP traffic
# clients get it as env variable
GATEWAY_NAME="$gateway"
# K8S DNS IP address
# clients get it as env variable
K8S_DNS_IPS="$K8S_DNS_ips"
# Blank  sepated IPs not sent to the POD gateway but to the default K8S
# This is needed, for example, in case your CNI does
# not add a non-default rule for the K8S addresses (Flannel does)
NOT_ROUTED_TO_GATEWAY_CIDRS=""

# Vxlan ID to use
VXLAN_ID="42"
# VXLAN need an /24 IP range not conflicting with K8S and local IP ranges
VXLAN_IP_NETWORK="172.16.0"
# Keep a range of IPs for static assignment in nat.conf
VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20

# If using a VPN, interface name created by it
VPN_INTERFACE=tun0
# Prevent non VPN traffic to leave the gateway
VPN_BLOCK_OTHER_TRAFFIC=true
# If VPN_BLOCK_OTHER_TRAFFIC is true, allow VPN traffic over this port
VPN_TRAFFIC_PORT=443
# Traffic to these IPs will be send through the K8S gateway
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"

# DNS queries to these domains will be resolved by K8S DNS instead of
# the default (typcally the VPN client changes it)
DNS_LOCAL_CIDRS="local"

# dnsmasq monitors directories. /etc/resolv.conf in a container is in another
# file system so it does not work. To circumvent this a copy is made using
# inotifyd
RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ . /default_config/settings.sh
++ GATEWAY_NAME=pod-gateway.vpn-gateway.svc.cluster.local
++ K8S_DNS_IPS=10.43.0.10
++ NOT_ROUTED_TO_GATEWAY_CIDRS=
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VPN_INTERFACE=tun0
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_TRAFFIC_PORT=443
++ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
++ DNS_LOCAL_CIDRS=local
++ RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ cat /config/settings.sh
#!/bin/sh
# Generated by pod-gateway
DNS_LOCAL_CIDRS="local"
NOT_ROUTED_TO_GATEWAY_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_BLOCK_OTHER_TRAFFIC="true"
VPN_INTERFACE="wg0"
VPN_LOCAL_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_TRAFFIC_PORT="51820"
VXLAN_GATEWAY_FIRST_DYNAMIC_IP="20"
VXLAN_ID="42"
VXLAN_IP_NETWORK="172.16.0"
+ . /config/settings.sh
++ DNS_LOCAL_CIDRS=local
++ NOT_ROUTED_TO_GATEWAY_CIDRS='192.168.0.0/16 10.0.0.0/8'
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_INTERFACE=wg0
++ VPN_LOCAL_CIDRS='192.168.0.0/16 10.0.0.0/8'
++ VPN_TRAFFIC_PORT=51820
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
+ ip addr
+ grep -q vxlan0
++ /sbin/ip route
++ awk '/default/ { print $3 }'
+ K8S_GW_IP=10.69.1.233
+ for local_cidr in $NOT_ROUTED_TO_GATEWAY_CIDRS
+ ip route add 192.168.0.0/16 via 10.69.1.233
+ for local_cidr in $NOT_ROUTED_TO_GATEWAY_CIDRS
+ ip route add 10.0.0.0/8 via 10.69.1.233
+ echo 'Deleting existing default GWs'
+ ip route del 0/0
Deleting existing default GWs
+ ping -c 1 -W 1000 8.8.8.8
ping: sendto: Network unreachable
+ ip addr
+ ip route
++ cut -d ' ' -f 1
+ K8S_DNS_IP=10.43.0.10
++ dig +short pod-gateway.vpn-gateway.svc.cluster.local @10.43.0.10
PING 8.8.8.8 (8.8.8.8): 56 data bytes
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3386: eth0@if3387: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:2e:f8:66:dd:a4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.69.1.196/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42e:f8ff:fe66:dda4/64 scope link tentative
       valid_lft forever preferred_lft forever
10.0.0.0/8 via 10.69.1.233 dev eth0
10.69.1.233 dev eth0 scope link
192.168.0.0/16 via 10.69.1.233 dev eth0
+ GATEWAY_IP=10.69.1.26
+++ hostname
++ grep test-6bc5585d5b-92psv /config/nat.conf
++ true
+ NAT_ENTRY=
+ VXLAN_GATEWAY_IP=172.16.0.1
+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3386: eth0@if3387: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 06:2e:f8:66:dd:a4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.69.1.196/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::42e:f8ff:fe66:dda4/64 scope link tentative
       valid_lft forever preferred_lft forever
+ ip route
10.0.0.0/8 via 10.69.1.233 dev eth0
10.69.1.233 dev eth0 scope link
192.168.0.0/16 via 10.69.1.233 dev eth0
+ ping -c1 10.69.1.26
PING 10.69.1.26 (10.69.1.26): 56 data bytes
64 bytes from 10.69.1.26: seq=0 ttl=63 time=0.097 ms

--- 10.69.1.26 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.097/0.097/0.097 ms
+ ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
+ bridge fdb append to 00:00:00:00:00:00 dst 10.69.1.26 dev vxlan0
+ ip link set up dev vxlan0
+ cat
+ [[ -z '' ]]
+ echo 'Get dynamic IP'
+ dhclient -v -cf /etc/dhclient.conf vxlan0
Get dynamic IP
Internet Systems Consortium DHCP Client 4.4.3
Copyright 2004-2022 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/vxlan0/1a:ad:f9:a7:92:77
Sending on   LPF/vxlan0/1a:ad:f9:a7:92:77
Sending on   Socket/fallback
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
+ sleep 9000

pod-gateway routes container

+ cat /default_config/settings.sh
#!/bin/bash

# hostname of the gateway - it must accept vxlan and DHCP traffic
# clients get it as env variable
GATEWAY_NAME="$gateway"
# K8S DNS IP address
# clients get it as env variable
K8S_DNS_IPS="$K8S_DNS_ips"
# Blank  sepated IPs not sent to the POD gateway but to the default K8S
# This is needed, for example, in case your CNI does
# not add a non-default rule for the K8S addresses (Flannel does)
NOT_ROUTED_TO_GATEWAY_CIDRS=""

# Vxlan ID to use
VXLAN_ID="42"
# VXLAN need an /24 IP range not conflicting with K8S and local IP ranges
VXLAN_IP_NETWORK="172.16.0"
# Keep a range of IPs for static assignment in nat.conf
VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20

# If using a VPN, interface name created by it
VPN_INTERFACE=tun0
# Prevent non VPN traffic to leave the gateway
VPN_BLOCK_OTHER_TRAFFIC=true
# If VPN_BLOCK_OTHER_TRAFFIC is true, allow VPN traffic over this port
VPN_TRAFFIC_PORT=443
# Traffic to these IPs will be send through the K8S gateway
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"

# DNS queries to these domains will be resolved by K8S DNS instead of
# the default (typcally the VPN client changes it)
DNS_LOCAL_CIDRS="local"

# dnsmasq monitors directories. /etc/resolv.conf in a container is in another
# file system so it does not work. To circumvent this a copy is made using
# inotifyd
RESOLV_CONF_COPY=/etc/resolv_copy.conf
#!/bin/sh
# Generated by pod-gateway
DNS_LOCAL_CIDRS="local"
NOT_ROUTED_TO_GATEWAY_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_BLOCK_OTHER_TRAFFIC="true"
VPN_INTERFACE="wg0"
VPN_LOCAL_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_TRAFFIC_PORT="51820"
VXLAN_GATEWAY_FIRST_DYNAMIC_IP="20"
VXLAN_ID="42"
VXLAN_IP_NETWORK="172.16.0"
+ . /default_config/settings.sh
++ GATEWAY_NAME=
++ K8S_DNS_IPS=
++ NOT_ROUTED_TO_GATEWAY_CIDRS=
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VPN_INTERFACE=tun0
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_TRAFFIC_PORT=443
++ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
++ DNS_LOCAL_CIDRS=local
++ RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ cat /config/settings.sh
+ . /config/settings.sh
++ DNS_LOCAL_CIDRS=local
++ NOT_ROUTED_TO_GATEWAY_CIDRS='192.168.0.0/16 10.0.0.0/8'
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_INTERFACE=wg0
++ VPN_LOCAL_CIDRS='192.168.0.0/16 10.0.0.0/8'
++ VPN_TRAFFIC_PORT=51820
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ cat /proc/sys/net/ipv4/ip_forward
+ [[ 1 -ne 1 ]]
+ VXLAN_GATEWAY_IP=172.16.0.1
+ ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
+ ip addr add 172.16.0.1/24 dev vxlan0
+ ip link set up dev vxlan0
+ iptables -t nat -A POSTROUTING -j MASQUERADE
+ [[ -n wg0 ]]
+ read -r line
+ [[ # Generated by pod-gateway =~ ^#.* ]]
+ continue
+ read -r line
+ echo 'Setting iptables for VPN with NIC wg0'
+ echo 'Accept traffic alredy ESTABLISHED'
+ iptables -A FORWARD -i wg0 -m state --state ESTABLISHED,RELATED -j ACCEPT
Setting iptables for VPN with NIC wg0
Accept traffic alredy ESTABLISHED
+ iptables -A FORWARD -i wg0 -j REJECT
+ [[ true == true ]]
+ iptables --policy FORWARD DROP
+ iptables -I FORWARD -o wg0 -j ACCEPT
+ iptables --policy OUTPUT DROP
+ iptables -A OUTPUT -p udp --dport 51820 -j ACCEPT
+ iptables -A OUTPUT -p tcp --dport 51820 -j ACCEPT
+ for local_cidr in $VPN_LOCAL_CIDRS
+ iptables -A OUTPUT -d 192.168.0.0/16 -j ACCEPT
+ for local_cidr in $VPN_LOCAL_CIDRS
+ iptables -A OUTPUT -d 10.0.0.0/8 -j ACCEPT
+ iptables -A OUTPUT -o wg0 -j ACCEPT
+ iptables -A OUTPUT -o vxlan0 -j ACCEPT
++ awk '/default/ { print $3 }'
++ /sbin/ip route
+ K8S_GW_IP=10.69.1.233
+ for local_cidr in $VPN_LOCAL_CIDRS
+ ip route add 192.168.0.0/16 via 10.69.1.233
+ for local_cidr in $VPN_LOCAL_CIDRS
+ ip route add 10.0.0.0/8 via 10.69.1.233

and also pod-gateway pod-gateway container

+ cat /default_config/settings.sh
#!/bin/bash

# hostname of the gateway - it must accept vxlan and DHCP traffic
# clients get it as env variable
GATEWAY_NAME="$gateway"
# K8S DNS IP address
# clients get it as env variable
K8S_DNS_IPS="$K8S_DNS_ips"
# Blank  sepated IPs not sent to the POD gateway but to the default K8S
# This is needed, for example, in case your CNI does
# not add a non-default rule for the K8S addresses (Flannel does)
NOT_ROUTED_TO_GATEWAY_CIDRS=""

# Vxlan ID to use
VXLAN_ID="42"
# VXLAN need an /24 IP range not conflicting with K8S and local IP ranges
VXLAN_IP_NETWORK="172.16.0"
# Keep a range of IPs for static assignment in nat.conf
VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20

# If using a VPN, interface name created by it
VPN_INTERFACE=tun0
# Prevent non VPN traffic to leave the gateway
VPN_BLOCK_OTHER_TRAFFIC=true
# If VPN_BLOCK_OTHER_TRAFFIC is true, allow VPN traffic over this port
VPN_TRAFFIC_PORT=443
# Traffic to these IPs will be send through the K8S gateway
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"

# DNS queries to these domains will be resolved by K8S DNS instead of
# the default (typcally the VPN client changes it)
DNS_LOCAL_CIDRS="local"

# dnsmasq monitors directories. /etc/resolv.conf in a container is in another
# file system so it does not work. To circumvent this a copy is made using
# inotifyd
RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ . /default_config/settings.sh
++ GATEWAY_NAME=
++ K8S_DNS_IPS=
++ NOT_ROUTED_TO_GATEWAY_CIDRS=
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VPN_INTERFACE=tun0
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_TRAFFIC_PORT=443
++ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
++ DNS_LOCAL_CIDRS=local
++ RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ cat /config/settings.sh
#!/bin/sh
# Generated by pod-gateway
DNS_LOCAL_CIDRS="local"
NOT_ROUTED_TO_GATEWAY_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_BLOCK_OTHER_TRAFFIC="true"
VPN_INTERFACE="wg0"
VPN_LOCAL_CIDRS="192.168.0.0/16 10.0.0.0/8"
VPN_TRAFFIC_PORT="51820"
VXLAN_GATEWAY_FIRST_DYNAMIC_IP="20"
VXLAN_ID="42"
VXLAN_IP_NETWORK="172.16.0"
+ . /config/settings.sh
++ DNS_LOCAL_CIDRS=local
++ NOT_ROUTED_TO_GATEWAY_CIDRS='192.168.0.0/16 10.0.0.0/8'
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_INTERFACE=wg0
++ VPN_LOCAL_CIDRS='192.168.0.0/16 10.0.0.0/8'
++ VPN_TRAFFIC_PORT=51820
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ cut '-d ' -f2
++ grep nameserver /etc/resolv.conf
+ K8S_DNS=10.43.0.10
+ cat
+ for local_cidr in $DNS_LOCAL_CIDRS
+ cat
+ /bin/copy_resolv.sh
copying /etc/resolv.conf to /etc/resolv_copy.conf
+ + dnsmasq=18
dnsmasq -k
+ inotifyd /bin/copy_resolv.sh /etc/resolv.conf:ce
dnsmasq[18]: started, version 2.86 cachesize 150
dnsmasq[18]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth cryptohash DNSSEC loop-detect inotify dumpfile
dnsmasq[18]: DNSSEC validation enabled
dnsmasq[18]: configured with trust anchor for <root> keytag 20326
dnsmasq-dhcp[18]: DHCP, IP range 172.16.0.20 -- 172.16.0.255, lease time 12h
dnsmasq-dhcp[18]: DHCP, sockets bound exclusively to interface vxlan0
dnsmasq[18]: using nameserver 10.43.0.10#53 for domain local (no DNSSEC)
dnsmasq[18]: reading /etc/resolv_copy.conf
dnsmasq[18]: using nameserver 10.43.0.10#53 for domain local (no DNSSEC)
dnsmasq[18]: using nameserver 10.43.0.10#53
dnsmasq[18]: read /etc/hosts - 7 addresses
copying /etc/resolv.conf to /etc/resolv_copy.conf
dnsmasq[18]: reading /etc/resolv_copy.conf
dnsmasq[18]: using nameserver 10.43.0.10#53 for domain local (no DNSSEC)
dnsmasq[18]: using nameserver 193.138.218.74#53
dnsmasq[18]: read /etc/hosts - 7 addresses

@legalgig
Copy link
Author

I've managed to fix connection issue with cilium, it turns out it encapsulates all traffic between nodes using vxlan by default. In order to use pod-gateway you need to disable encapsulation (switch to native-routing). Here are instructions how to do that: https://docs.cilium.io/en/v1.11/concepts/networking/routing/#native-routing

Thanks for your support!

btw. please update helm chart with newest pod-gateway and webhook image

@bjw-s
Copy link
Contributor

bjw-s commented Jun 28, 2022

Glad you managed to fix the issue and for sharing the solution!

btw. please update helm chart with newest pod-gateway and webhook image

You can always override the image tags in your values.yaml. That way you do not have to depend on us to update the values. We are a small group of maintainers and have finite time to spend on updating images. PR's are very welcome of course :)

@angelnu
Copy link
Contributor

angelnu commented Jun 28, 2022

@legalgig - glad you got it working - is there any disadvantages on using native routing? Or why is not vxlan on vxlan working?

Ideally we will add the instructions with the trade-offs to https://docs.k8s-at-home.com/guides/pod-gateway/

@legalgig
Copy link
Author

legalgig commented Jun 28, 2022

Glad you managed to fix the issue and for sharing the solution!

btw. please update helm chart with newest pod-gateway and webhook image

You can always override the image tags in your values.yaml. That way you do not have to depend on us to update the values. We are a small group of maintainers and have finite time to spend on updating images. PR's are very welcome of course :)

sure I'll try to create PR this week ;)

@legalgig - glad you got it working - is there any disadvantages on using native routing? Or why is not vxlan on vxlan working?

Ideally we will add the instructions with the trade-offs to https://docs.k8s-at-home.com/guides/pod-gateway/

As far as I can tell there are no real disadvantages on using native routing it is just much needier in terms of configuration than vxlan encapsulating. If you want to use native routing you need to fulfill one of these points:

  • all workers on same L2 network so it can auto generate routes between pods on different nodes
  • have a router which knows all routes to all pods (i.e. router provided by cloud-provider)
  • make each host aware how to route traffic between pods i.e. BGP

in most cases homelabbers tend to have all k8s hosts in one L2 network so it suffices to set following helm values:

tunnel: "disabled"  # disable  vxlan encapsulating
ipv4NativeRoutingCIDR: 10.0.0.0/16  # set to pod cidr
autoDirectNodeRoutes: true # enable auto generated routes 
ipam:
  operator: 
    clusterPoolIPv4PodCIDRList:
    - "10.0.0.0/16"  # set pod cidr

why is not vxlan on vxlan working?

Good question, when you use vxlan encapsulation it creates routes for pod cidr (in this case 10.0.0.0/16) and everything that is not within that cidr (i.e. traffic from pods to pod-gateway on 172.16.0.0/24 network) is routed into default gateway (your home router) where it goes into nirvana because your router isn't aware how to route packets to your k8s pods. I'm just speculating so don't quote me on that :)

@arana198
Copy link

arana198 commented Jun 30, 2022

I'm having the same issue "No DHCPOFFERS received" on flannel

Stuck on this for few weeks

My pod-gateway config:

image:
  repository: ghcr.io/k8s-at-home/pod-gateway
  tag: latest

webhook:
  image:
    repository: ghcr.io/k8s-at-home/gateway-admision-controller
    pullPolicy: IfNotPresent
    tag: latest

routed_namespaces:
  - mediaserver


settings:
  VPN_INTERFACE: tun0
  VPN_BLOCK_OTHER_TRAFFIC: true
  VPN_TRAFFIC_PORT: 443
  NOT_ROUTED_TO_GATEWAY_CIDRS: "10.244.0.0/16 10.96.0.0/12"

addons:
  vpn:
    enabled: true
    type: openvpn
    openvpn:
      authSecret: pod-gateway-vpn-auth

    env:
      FIREWALL: "on"
      DNS: "8.8.8.8"

    securityContext:
      capabilities:
        add:
          - NET_ADMIN
          - SYS_MODULE
      runAsGroup: 0
      runAsUser: 0

    configFile: |- <VPN-CONFIG>

    livenessProbe:
      exec:
        command:
          - sh
          - -c
          - if [ $(curl -s https://ipinfo.io/country) == "FR" ]; then exit 0; else exit $?; fi

      initialDelaySeconds: 10
      periodSeconds: 60
      failureThreshold: 1

    networkPolicy:
      enabled: true
      egress:
        # Allow only VPN traffic to Internet
        - to:
            - ipBlock:
                cidr: 0.0.0.0/0
          ports:
            # VPN traffic port - change if your provider uses a different port
            - port: 1194
              protocol: UDP
            - port: 443
              protocol: UDP
        - to:
            # Allow traffic within K8S - change if your K8S cluster uses a different CIDR
            - ipBlock:
                cidr: 10.0.0.0/8

OpenVPN connects to server and I am able to curl blocked websites

pod-gateway logs

+ cat /default_config/settings.sh
#!/bin/bash

# hostname of the gateway - it must accept vxlan and DHCP traffic
# clients get it as env variable
GATEWAY_NAME="$gateway"
# K8S DNS IP address
# clients get it as env variable
K8S_DNS_IPS="$K8S_DNS_ips"
# Blank  sepated IPs not sent to the POD gateway but to the default K8S
# This is needed, for example, in case your CNI does
+ . /default_config/settings.sh
# not add a non-default rule for the K8S addresses (Flannel does)
NOT_ROUTED_TO_GATEWAY_CIDRS=""

# Vxlan ID to use
VXLAN_ID="42"
# VXLAN need an /24 IP range not conflicting with K8S and local IP ranges
VXLAN_IP_NETWORK="172.16.0"
# Keep a range of IPs for static assignment in nat.conf
VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20

# If using a VPN, interface name created by it
VPN_INTERFACE=tun0
# Prevent non VPN traffic to leave the gateway
VPN_BLOCK_OTHER_TRAFFIC=true
# If VPN_BLOCK_OTHER_TRAFFIC is true, allow VPN traffic over this port
VPN_TRAFFIC_PORT=443
# Traffic to these IPs will be send through the K8S gateway
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"

# DNS queries to these domains will be resolved by K8S DNS instead of
# the default (typcally the VPN client changes it)
DNS_LOCAL_CIDRS="local"

# dnsmasq monitors directories. /etc/resolv.conf in a container is in another
# file system so it does not work. To circumvent this a copy is made using
# inotifyd
RESOLV_CONF_COPY=/etc/resolv_copy.conf
#!/bin/sh
# Generated by pod-gateway
DNS_LOCAL_CIDRS="local"
NOT_ROUTED_TO_GATEWAY_CIDRS="10.244.0.0/16 10.96.0.0/12"
VPN_BLOCK_OTHER_TRAFFIC="true"
VPN_INTERFACE="tun0"
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"
VPN_TRAFFIC_PORT="443"
VXLAN_GATEWAY_FIRST_DYNAMIC_IP="20"
VXLAN_ID="42"
VXLAN_IP_NETWORK="172.16.0"
++ GATEWAY_NAME=
++ K8S_DNS_IPS=
++ NOT_ROUTED_TO_GATEWAY_CIDRS=
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VPN_INTERFACE=tun0
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_TRAFFIC_PORT=443
++ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
++ DNS_LOCAL_CIDRS=local
++ RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ cat /config/settings.sh
+ . /config/settings.sh
++ DNS_LOCAL_CIDRS=local
++ NOT_ROUTED_TO_GATEWAY_CIDRS='10.244.0.0/16 10.96.0.0/12'
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_INTERFACE=tun0
++ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
++ VPN_TRAFFIC_PORT=443
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ cut '-d ' -f2
++ grep nameserver /etc/resolv.conf
+ K8S_DNS=10.96.0.10
+ cat
+ for local_cidr in $DNS_LOCAL_CIDRS
+ cat
+ /bin/copy_resolv.sh
copying /etc/resolv.conf to /etc/resolv_copy.conf
+ dnsmasq=16
+ dnsmasq -k
+ inotifyd /bin/copy_resolv.sh /etc/resolv.conf:ce
dnsmasq[16]: started, version 2.86 cachesize 150
dnsmasq[16]: compile time options: IPv6 GNU-getopt no-DBus no-UBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth cryptohash DNSSEC loop-detect inotify dumpfile
dnsmasq[16]: DNSSEC validation enabled
dnsmasq[16]: configured with trust anchor for <root> keytag 20326
dnsmasq-dhcp[16]: DHCP, IP range 172.16.0.20 -- 172.16.0.255, lease time 12h
dnsmasq-dhcp[16]: DHCP, sockets bound exclusively to interface vxlan0
dnsmasq[16]: using nameserver 10.96.0.10#53 for domain local (no DNSSEC)
dnsmasq[16]: reading /etc/resolv_copy.conf
dnsmasq[16]: using nameserver 10.96.0.10#53 for domain local (no DNSSEC)
dnsmasq[16]: using nameserver 10.96.0.10#53
dnsmasq[16]: read /etc/hosts - 7 addresses
copying /etc/resolv.conf to /etc/resolv_copy.conf
dnsmasq[16]: reading /etc/resolv_copy.conf
dnsmasq[16]: using nameserver 10.96.0.10#53 for domain local (no DNSSEC)
dnsmasq[16]: using nameserver 103.86.96.100#53
dnsmasq[16]: using nameserver 103.86.99.100#53
dnsmasq[16]: read /etc/hosts - 7 addresses
copying /etc/resolv.conf to /etc/resolv_copy.conf
dnsmasq[16]: reading /etc/resolv_copy.conf
dnsmasq[16]: using nameserver 10.96.0.10#53 for domain local (no DNSSEC)
dnsmasq[16]: using nameserver 103.86.96.100#53
dnsmasq[16]: using nameserver 103.86.99.100#53
dnsmasq[16]: read /etc/hosts - 7 addresses
copying /etc/resolv.conf to /etc/resolv_copy.conf
dnsmasq[16]: reading /etc/resolv_copy.conf
dnsmasq[16]: using nameserver 10.96.0.10#53 for domain local (no DNSSEC)
dnsmasq[16]: using nameserver 103.86.96.100#53
dnsmasq[16]: using nameserver 103.86.99.100#53
dnsmasq[16]: read /etc/hosts - 7 addresses

pod-gateway routes logs

+ cat /default_config/settings.sh
#!/bin/bash

# hostname of the gateway - it must accept vxlan and DHCP traffic
# clients get it as env variable
GATEWAY_NAME="$gateway"
# K8S DNS IP address
# clients get it as env variable
K8S_DNS_IPS="$K8S_DNS_ips"
# Blank  sepated IPs not sent to the POD gateway but to the default K8S
# This is needed, for example, in case your CNI does
# not add a non-default rule for the K8S addresses (Flannel does)
NOT_ROUTED_TO_GATEWAY_CIDRS=""

# Vxlan ID to use
VXLAN_ID="42"
# VXLAN need an /24 IP range not conflicting with K8S and local IP ranges
VXLAN_IP_NETWORK="172.16.0"
# Keep a range of IPs for static assignment in nat.conf
VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20

# If using a VPN, interface name created by it
VPN_INTERFACE=tun0
# Prevent non VPN traffic to leave the gateway
VPN_BLOCK_OTHER_TRAFFIC=true
# If VPN_BLOCK_OTHER_TRAFFIC is true, allow VPN traffic over this port
VPN_TRAFFIC_PORT=443
# Traffic to these IPs will be send through the K8S gateway
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"

# DNS queries to these domains will be resolved by K8S DNS instead of
# the default (typcally the VPN client changes it)
DNS_LOCAL_CIDRS="local"

# dnsmasq monitors directories. /etc/resolv.conf in a container is in another
# file system so it does not work. To circumvent this a copy is made using
# inotifyd
RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ . /default_config/settings.sh
++ GATEWAY_NAME=
++ K8S_DNS_IPS=
++ NOT_ROUTED_TO_GATEWAY_CIDRS=
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VPN_INTERFACE=tun0
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_TRAFFIC_PORT=443
++ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
++ DNS_LOCAL_CIDRS=local
++ RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ cat /config/settings.sh
#!/bin/sh
# Generated by pod-gateway
DNS_LOCAL_CIDRS="local"
NOT_ROUTED_TO_GATEWAY_CIDRS="10.244.0.0/16 10.96.0.0/12"
VPN_BLOCK_OTHER_TRAFFIC="true"
VPN_INTERFACE="tun0"
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"
VPN_TRAFFIC_PORT="443"
VXLAN_GATEWAY_FIRST_DYNAMIC_IP="20"
VXLAN_ID="42"
VXLAN_IP_NETWORK="172.16.0"
+ . /config/settings.sh
++ DNS_LOCAL_CIDRS=local
++ NOT_ROUTED_TO_GATEWAY_CIDRS='10.244.0.0/16 10.96.0.0/12'
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_INTERFACE=tun0
++ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
++ VPN_TRAFFIC_PORT=443
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ cat /proc/sys/net/ipv4/ip_forward
+ [[ 1 -ne 1 ]]
+ VXLAN_GATEWAY_IP=172.16.0.1
+ ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
+ ip addr add 172.16.0.1/24 dev vxlan0
+ ip link set up dev vxlan0
+ iptables -t nat -A POSTROUTING -j MASQUERADE
+ [[ -n tun0 ]]
+ read -r line
+ [[ # Generated by pod-gateway =~ ^#.* ]]
+ continue
+ read -r line
+ echo 'Setting iptables for VPN with NIC tun0'
+ echo 'Accept traffic alredy ESTABLISHED'
Setting iptables for VPN with NIC tun0
Accept traffic alredy ESTABLISHED
+ iptables -A FORWARD -i tun0 -m state --state ESTABLISHED,RELATED -j ACCEPT
+ iptables -A FORWARD -i tun0 -j REJECT
+ [[ true == true ]]
+ iptables --policy FORWARD DROP
+ iptables -I FORWARD -o tun0 -j ACCEPT
+ iptables --policy OUTPUT DROP
+ iptables -A OUTPUT -p udp --dport 443 -j ACCEPT
+ iptables -A OUTPUT -p tcp --dport 443 -j ACCEPT
+ for local_cidr in $VPN_LOCAL_CIDRS
+ iptables -A OUTPUT -d 10.0.0.0/8 -j ACCEPT
+ for local_cidr in $VPN_LOCAL_CIDRS
+ iptables -A OUTPUT -d 192.168.0.0/16 -j ACCEPT
+ iptables -A OUTPUT -o tun0 -j ACCEPT
+ iptables -A OUTPUT -o vxlan0 -j ACCEPT
++ /sbin/ip route
++ awk '/default/ { print $3 }'
+ K8S_GW_IP=10.244.3.1
+ for local_cidr in $VPN_LOCAL_CIDRS
+ ip route add 10.0.0.0/8 via 10.244.3.1
+ for local_cidr in $VPN_LOCAL_CIDRS
+ ip route add 192.168.0.0/16 via 10.244.3.1

transmission service (routed pod) logs

+ cat /default_config/settings.sh
#!/bin/bash

# hostname of the gateway - it must accept vxlan and DHCP traffic
# clients get it as env variable
GATEWAY_NAME="$gateway"
# K8S DNS IP address
# clients get it as env variable
K8S_DNS_IPS="$K8S_DNS_ips"
# Blank  sepated IPs not sent to the POD gateway but to the default K8S
# This is needed, for example, in case your CNI does
# not add a non-default rule for the K8S addresses (Flannel does)
NOT_ROUTED_TO_GATEWAY_CIDRS=""

# Vxlan ID to use
VXLAN_ID="42"
# VXLAN need an /24 IP range not conflicting with K8S and local IP ranges
VXLAN_IP_NETWORK="172.16.0"
# Keep a range of IPs for static assignment in nat.conf
VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20

# If using a VPN, interface name created by it
VPN_INTERFACE=tun0
# Prevent non VPN traffic to leave the gateway
VPN_BLOCK_OTHER_TRAFFIC=true
# If VPN_BLOCK_OTHER_TRAFFIC is true, allow VPN traffic over this port
VPN_TRAFFIC_PORT=443
# Traffic to these IPs will be send through the K8S gateway
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"

# DNS queries to these domains will be resolved by K8S DNS instead of
# the default (typcally the VPN client changes it)
DNS_LOCAL_CIDRS="local"

# dnsmasq monitors directories. /etc/resolv.conf in a container is in another
# file system so it does not work. To circumvent this a copy is made using
# inotifyd
RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ . /default_config/settings.sh
++ GATEWAY_NAME=pod-gateway.vpn.svc.cluster.local
++ K8S_DNS_IPS=10.96.0.10
++ NOT_ROUTED_TO_GATEWAY_CIDRS=
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VPN_INTERFACE=tun0
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_TRAFFIC_PORT=443
++ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
++ DNS_LOCAL_CIDRS=local
++ RESOLV_CONF_COPY=/etc/resolv_copy.conf
+ cat /config/settings.sh
#!/bin/sh
# Generated by pod-gateway
DNS_LOCAL_CIDRS="local"
NOT_ROUTED_TO_GATEWAY_CIDRS="10.244.0.0/16 10.96.0.0/12"
VPN_BLOCK_OTHER_TRAFFIC="true"
VPN_INTERFACE="tun0"
VPN_LOCAL_CIDRS="10.0.0.0/8 192.168.0.0/16"
VPN_TRAFFIC_PORT="443"
VXLAN_GATEWAY_FIRST_DYNAMIC_IP="20"
VXLAN_ID="42"
VXLAN_IP_NETWORK="172.16.0"
+ . /config/settings.sh
++ DNS_LOCAL_CIDRS=local
++ NOT_ROUTED_TO_GATEWAY_CIDRS='10.244.0.0/16 10.96.0.0/12'
++ VPN_BLOCK_OTHER_TRAFFIC=true
++ VPN_INTERFACE=tun0
++ VPN_LOCAL_CIDRS='10.0.0.0/8 192.168.0.0/16'
++ VPN_TRAFFIC_PORT=443
++ VXLAN_GATEWAY_FIRST_DYNAMIC_IP=20
++ VXLAN_ID=42
++ VXLAN_IP_NETWORK=172.16.0
+ ip addr
+ grep -q vxlan0
+ ip link del vxlan0
+ echo 'Deleting existing default GWs'
Deleting existing default GWs
+ ip route del 0/0
RTNETLINK answers: No such process
+ /bin/true
+ ping -c 1 -W 1000 8.8.8.8
ping: sendto: Network unreachable
PING 8.8.8.8 (8.8.8.8): 56 data bytes
+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if315: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:ae:80:29:1b:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.1.111/24 brd 10.244.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ae:80ff:fe29:1b0c/64 scope link
       valid_lft forever preferred_lft forever
+ ip route
10.96.0.0/12 via 10.244.1.1 dev eth0
10.244.0.0/16 via 10.244.1.1 dev eth0
10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.111
++ cut -d ' ' -f 1
+ K8S_DNS_IP=10.96.0.10
++ dig +short pod-gateway.vpn.svc.cluster.local @10.96.0.10
+ GATEWAY_IP=10.244.3.182
+++ hostname
++ grep transmission-669f574b66-4622w /config/nat.conf
++ true
+ NAT_ENTRY=
+ VXLAN_GATEWAY_IP=172.16.0.1
+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if315: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:ae:80:29:1b:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.1.111/24 brd 10.244.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ae:80ff:fe29:1b0c/64 scope link
       valid_lft forever preferred_lft forever
+ ip route
10.96.0.0/12 via 10.244.1.1 dev eth0
10.244.0.0/16 via 10.244.1.1 dev eth0
10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.111
+ ping -c1 10.244.3.182
PING 10.244.3.182 (10.244.3.182): 56 data bytes
64 bytes from 10.244.3.182: seq=0 ttl=62 time=1.165 ms

--- 10.244.3.182 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 1.165/1.165/1.165 ms
+ ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
+ bridge fdb append to 00:00:00:00:00:00 dst 10.244.3.182 dev vxlan0
+ ip link set up dev vxlan0
+ cat
+ [[ -z '' ]]
+ echo 'Get dynamic IP'
+ dhclient -v -cf /etc/dhclient.conf vxlan0
Get dynamic IP
Internet Systems Consortium DHCP Client 4.4.3
Copyright 2004-2022 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

/etc/dhclient.conf line 3: semicolon expected.
link-timeout 10;
              ^
/etc/dhclient.conf line 4: semicolon expected.
reboot
 ^
Listening on LPF/vxlan0/8a:bc:cf:f5:ce:bd
Sending on   LPF/vxlan0/8a:bc:cf:f5:ce:bd
Sending on   Socket/fallback
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 2
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
DHCPDISCOVER on vxlan0 to 255.255.255.255 port 67 interval 1
No DHCPOFFERS received.
No working leases in persistent database - sleeping.
+ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if315: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether 02:ae:80:29:1b:0c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.244.1.111/24 brd 10.244.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::ae:80ff:fe29:1b0c/64 scope link
       valid_lft forever preferred_lft forever
5: vxlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 8a:bc:cf:f5:ce:bd brd ff:ff:ff:ff:ff:ff
    inet6 fe80::88bc:cfff:fef5:cebd/64 scope link
       valid_lft forever preferred_lft forever
+ ip route
10.96.0.0/12 via 10.244.1.1 dev eth0
10.244.0.0/16 via 10.244.1.1 dev eth0
10.244.1.0/24 dev eth0 proto kernel scope link src 10.244.1.111
+ ping -c1 172.16.0.1
PING 172.16.0.1 (172.16.0.1): 56 data bytes
ping: sendto: Network unreachable

Any guidance or help appreciated :)

@angelnu
Copy link
Contributor

angelnu commented Jul 2, 2022

What are your network policies?

You should have:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: mediaserver
spec:
  podSelector: {}
  ingress:
  - from:
    # Only allow ingress from K8S
    - ipBlock:
        cidr: 10.0.0.0/8
  egress:
  - to:
    # Only allow egress to K8S
    - ipBlock:
        cidr: 10.0.0.0/8
  policyTypes:
    - Ingress
    - Egress

Was this config working for you before? If yes, what has changed?

Beyond the network policy I do not see any other reason for the problem which is that the vxlan packages are not being transmitted. Flannel by default does not use vxlanx so I do not think this is the same problem as reported by @legalgig so opening a new issue would be better.

@angelnu
Copy link
Contributor

angelnu commented Jul 4, 2022

Just another thing to consider - there are some VPNs that route ALL traffic through them so the DHCP server gets a query but the reply does not arrive.

When this is the case the DHCP server should still log that the query was received:

test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 DHCPOFFER(vxlan0) 172.16.2.42 52:14:24:76:17:9d
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 requested options: 1:netmask, 28:broadcast, 3:router
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 next server: 172.16.2.1
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 sent size:  1 option: 53 message-type  2
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 sent size:  4 option: 54 server-identifier  172.16.2.1
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 sent size:  4 option: 51 lease-time  12h
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 sent size:  4 option: 58 T1  6h
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 sent size:  4 option: 59 T2  10h30m
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 sent size:  4 option:  1 netmask  255.255.255.0
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 sent size:  4 option: 28 broadcast  172.16.2.255
test-gateway-pod-gateway-564c459865-g2lt9 test-gateway-pod-gateway dnsmasq-dhcp[15]: 486152276 sent size:  4 option:  3 router  172.16.2.1

@bjw-s found this with being the case with gluetun.

@5cat
Copy link

5cat commented Aug 5, 2022

What are your network policies?

You should have:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: mediaserver
spec:
  podSelector: {}
  ingress:
  - from:
    # Only allow ingress from K8S
    - ipBlock:
        cidr: 10.0.0.0/8
  egress:
  - to:
    # Only allow egress to K8S
    - ipBlock:
        cidr: 10.0.0.0/8
  policyTypes:
    - Ingress
    - Egress

Was this config working for you before? If yes, what has changed?

Beyond the network policy I do not see any other reason for the problem which is that the vxlan packages are not being transmitted. Flannel by default does not use vxlanx so I do not think this is the same problem as reported by @legalgig so opening a new issue would be better.

I can confirm this solved my problem.

Although the dhcp problems are still there

/etc/dhclient.conf line 3: semicolon expected.
link-timeout 10;
              ^
/etc/dhclient.conf line 4: semicolon expected.
reboot
 ^

I have this image
ghcr.io/k8s-at-home/pod-gateway@sha256:dcb2d814a4f7dc175f096e5a14035b5afbd2ae5b9e07eb623847a121bd46bca4 which matches the latest 1.6.1 https://github.com/k8s-at-home/pod-gateway/pkgs/container/pod-gateway
ghcr.io/k8s-at-home/gateway-admision-controller@sha256:175512bb3f616af830b56148307c9f620f6e38b8afda01043bc also the latest.

but everything else is working as expected.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants