Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RKE2 DNS Problem #7719

Closed
ilhan-sadikoglu opened this issue Feb 7, 2025 · 2 comments
Closed

RKE2 DNS Problem #7719

ilhan-sadikoglu opened this issue Feb 7, 2025 · 2 comments

Comments

@ilhan-sadikoglu
Copy link

Problem:
Hello, I am preparing to convert my clusters to rke2. However, when I test it after normal installation, I see that the pods cannot perform DNS resolution.

The strange thing is that I have 2 coredns pods, one on the master and one on the worker, and when I kill the one on the master, the DNS resolution of all pods is temporarily fixed. As soon as the coredns pod on the master gets itself up, dns resolution goes again.

Architecture:
1 master, 1 worker in Ubuntu 24.04 via Esxi

My Installation Doc:

Master and Worker:

sudo su
apt-get update
apt-get upgrade
sudo tee /etc/modules-load.d/containerd.conf <<EOF
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
sudo sysctl --system
sudo swapoff -a
sudo sed -i '/ swap / s/^(.*)$/#\1/g' /etc/fstab

nano /etc/hosts
192.168.88.17 test-master-1
192.168.88.23 test-worker-1

sudo systemctl stop systemd-resolved
sudo systemctl disable systemd-resolved
sudo chattr -i /etc/resolv.conf
sudo rm -f /etc/resolv.conf
echo "nameserver 192.168.88.18" | sudo tee /etc/resolv.conf ---> my dnsmasq server
sudo chattr +i /etc/resolv.conf

reboot

Master:

mkdir rke2-install
cd rke2-install
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=server sh -
sudo systemctl enable rke2-server --now
echo 'export KUBECONFIG=/etc/rancher/rke2/rke2.yaml' >> ~/.bashrc
source ~/.bashrc
ln -s $(find /var/lib/rancher/rke2/data/ -name kubectl) /usr/local/bin/kubectl
kubectl get node
sudo snap install k9s
sudo ln -s /snap/k9s/current/bin/k9s /snap/bin/ #Reboot ve versiyon güncellemelerde bozulabilir tekrar çalıştır
curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh
cat /var/lib/rancher/rke2/server/node-token

Worker:

mkdir rke2-install
cd rke2-install
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE=agent sh -
mkdir -p /etc/rancher/rke2/
echo "server: https://192.168.88.17:9345" | sudo tee /etc/rancher/rke2/config.yaml
echo "token: MYTOKEN" | sudo tee -a /etc/rancher/rke2/config.yaml
systemctl enable rke2-agent
systemctl start rke2-agent

kubectl create deployment test-nginx --image=nginx
kubectl get pods -o wide

I have installed the servers from scratch many times and have not run any commands except the ones above.

Thanks for help

@brandond
Copy link
Member

brandond commented Feb 7, 2025

Take a look at:

You're running on vmware, it is highly likely that you are affected by vmware virtual ethernet vxlan checksum offload bug.

@brandond brandond closed this as completed Feb 7, 2025
@ilhan-sadikoglu
Copy link
Author

Its work :D

I've been dealing with this error for 3 days, thank you very much man.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants