Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoreDNS timeout on vSphere cluster when resolve a service #8144

Open
ygao-armada opened this issue May 11, 2024 · 2 comments
Open

CoreDNS timeout on vSphere cluster when resolve a service #8144

ygao-armada opened this issue May 11, 2024 · 2 comments

Comments

@ygao-armada
Copy link

ygao-armada commented May 11, 2024

What happened:
In EKSA cluster for vSphere, we have a strange error, on worker node, if we replace the /etc/resolv.conf with that from pod argocd-server-xxx:

search argocd.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.192.10
options ndots:5

The nslook up command will resolve the IP (10.96.221.1) first, then wait for 10 seconds til timeout

root@mgmt20-md-0-7k7hk-vcnh2:/home/ec2-user# nslookup argocd-redis
Server:     10.96.192.10
Address:    10.96.192.10#53

Name:  argocd-redis.argocd.svc.cluster.local
Address: 10.96.221.1
;; connection timed out; no servers could be reached


root@mgmt20-md-0-7k7hk-vcnh2:/home/ec2-user# exit

We can see the IP (10.96.221.1) is correct as follows:

ubuntu@ubuntuguest:~$ kubectl get svc -A -o wide | grep 10.96.221.1
argocd               argocd-redis                         ClusterIP  10.96.221.1   <none>    6379/TCP            135m  app.kubernetes.io/name=argocd-redis

And 10.96.192.10 is the coredns IP:

ubuntu@ubuntuguest:~$ kubectl get svc -A -o wide | grep 10.96.192.10
kube-system             kube-dns                           ClusterIP  10.96.192.10  <none>    53/UDP,53/TCP,9153/TCP     103d  k8s-app=kube-dns

Am I missing something?

What you expected to happen:
No timeout should happen for command "nslookup argocd-redis"

How to reproduce it (as minimally and precisely as possible):
Install argoCD on a EKSA vSphere cluster, and take the steps in above description.

Anything else we need to know?:

Environment:

  • EKS Anywhere Release:
  • EKS Distro Release:
@sp1999
Copy link
Member

sp1999 commented May 12, 2024

Thanks for reporting @ygao-armada. We are looking into this issue and will get back with any information we find.

@ygao-armada
Copy link
Author

@sp1999 Some update, I find it's related to gpu-operator, look like, if we install argocd before gpu-operator, there is no such issue.
And I install argocd with:

kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

And I install gpu-operator with instruction from: https://github.com/NVIDIA/gpu-operator/blob/release-23.9/scripts/install-gpu-operator-nvaie.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants