RKE2 DNS Problem #7719

ilhan-sadikoglu · 2025-02-07T19:18:58Z

Problem:
Hello, I am preparing to convert my clusters to rke2. However, when I test it after normal installation, I see that the pods cannot perform DNS resolution.

The strange thing is that I have 2 coredns pods, one on the master and one on the worker, and when I kill the one on the master, the DNS resolution of all pods is temporarily fixed. As soon as the coredns pod on the master gets itself up, dns resolution goes again.

Architecture:
1 master, 1 worker in Ubuntu 24.04 via Esxi

My Installation Doc:

Master and Worker:

sudo su
apt-get update
apt-get upgrade
sudo tee /etc/modules-load.d/containerd.conf <<EOF
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sudo sysctl --system
sudo swapoff -a
sudo sed -i '/ swap / s/^(.*)$/#\1/g' /etc/fstab

nano /etc/hosts
192.168.88.17 test-master-1
192.168.88.23 test-worker-1

sudo systemctl stop systemd-resolved
sudo systemctl disable systemd-resolved
sudo chattr -i /etc/resolv.conf
sudo rm -f /etc/resolv.conf
echo "nameserver 192.168.88.18" | sudo tee /etc/resolv.conf ---> my dnsmasq server
sudo chattr +i /etc/resolv.conf

reboot

Master:

mkdir rke2-install
cd rke2-install
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
sudo systemctl enable rke2-server --now
echo 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrc
source ~/.bashrc
ln -s $(find /var/lib/rancher/rke2/data/ -name kubectl) /usr/local/bin/kubectl
kubectl get node
sudo snap install k9s
sudo ln -s /snap/k9s/current/bin/k9s /snap/bin/ #Reboot ve versiyon güncellemelerde bozulabilir tekrar çalıştır
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
cat /var/lib/rancher/rke2/server/node-token

Worker:

mkdir rke2-install
cd rke2-install
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=agent sh -
mkdir -p /etc/rancher/rke2/
echo "server: https://192.168.88.17:9345" | sudo tee /etc/rancher/rke2/config.yaml
echo "token: MYTOKEN" | sudo tee -a /etc/rancher/rke2/config.yaml
systemctl enable rke2-agent
systemctl start rke2-agent

kubectl create deployment test-nginx --image=nginx
kubectl get pods -o wide

I have installed the servers from scratch many times and have not run any commands except the ones above.

Thanks for help

brandond · 2025-02-07T19:37:15Z

Take a look at:

TCP offloading on vxlan.calico adaptor causing 63 second delays in VXLAN communications node->nodeport or node->clusterip:port. projectcalico/calico#3145 (comment)

You're running on vmware, it is highly likely that you are affected by vmware virtual ethernet vxlan checksum offload bug.

ilhan-sadikoglu · 2025-02-07T19:56:46Z

Its work :D

I've been dealing with this error for 3 days, thank you very much man.

brandond closed this as completed Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RKE2 DNS Problem #7719

RKE2 DNS Problem #7719

ilhan-sadikoglu commented Feb 7, 2025

brandond commented Feb 7, 2025

ilhan-sadikoglu commented Feb 7, 2025

RKE2 DNS Problem #7719

RKE2 DNS Problem #7719

Comments

ilhan-sadikoglu commented Feb 7, 2025

brandond commented Feb 7, 2025

ilhan-sadikoglu commented Feb 7, 2025