Request to itself takes too long. #478
Replies: 14 comments 21 replies
-
@GranderStark Thanks for the info, I will investigate. However, the flow of traffic out does not go through the LB, this is just for traffic in if you use ingress definitions. However, there could be something going on with DNS, I will have a look ASAP. |
Beta Was this translation helpful? Give feedback.
-
Between, if you use your own container registry, and expose it to the world with nginx and not traefik, then it will not be reachable within the cluster via DNS if you do not modify the nginx service. See terraform-hcloud-kube-hetzner/README.md Line 319 in 9b4e813 |
Beta Was this translation helpful? Give feedback.
-
@mysticaltech , Thank you for quick response and for info about nginx + dcr configuration. Will try to test with additional annotation. About LB and traffic: Request scheme: Behaviour of
Hope this will help Please, let me know, if I need to provide any additional information, thank you! |
Beta Was this translation helpful? Give feedback.
-
@GranderStark Now I understand better, yes! Please share your kube.tf without the sensitive values. Maybe your cluster does not have enough resources. As a rule of thumb, anything from Rancher, be it Longhorn or Rancher itself, needs at least nodes with 4GB of RAM. |
Beta Was this translation helpful? Give feedback.
-
It's pretty standard- I've disabled agent-small only. kube.tf
|
Beta Was this translation helpful? Give feedback.
-
@kube-hetzner/core Any ideas on this? |
Beta Was this translation helpful? Give feedback.
-
I think I noticed something similar while trying to setup etcd snaphots through s3. Actually, I configured it on two clusters - on one it works, on the second it doesn‘t. Cluster AWith this cluster it does not work (loooooong request times)
Request times are higher than 1 minute when talking with Cluster BWith this cluster it does work (normal request times)
Request times are alright when I talk with What I tried
Things I ruled out
I also really appreciate any help here! I hope my description helps. I think, that it might be some routing / VPN issue. But I don‘t know enough of the networking stuff to trace it down. |
Beta Was this translation helpful? Give feedback.
-
@maaft Great explanation of you setup! By VPN you mean wireguard huh? But with which cni, cilium ? And why do you need it? Thanks to the cluster using a private network only, the VPN is not required for security. The external interfaces are not part of Kubernetes. |
Beta Was this translation helpful? Give feedback.
-
With VPN I mean the hetzner private network. (cluster A) My other cluster (cluster B) uses wireguard with flannel. Reason is seamless integration with bare metal servers, as I had issues with the CCM and vswitch integration. Also vswitch performance is reported to be poor l, so I didn't bother to debug it further. |
Beta Was this translation helpful? Give feedback.
-
@maaft Ok good to hear. Looking forward to seeing how to finally pulled off the bear metal integration and then maybe I can help with this issue. We'll figure out what's causing the delay 🤞 For now, the first thing that comes to mind is DNS, please try setting dns_servers to Google's in kube.tf, maybe this helps. |
Beta Was this translation helpful? Give feedback.
-
@mysticaltech I changed |
Beta Was this translation helpful? Give feedback.
-
Weird, I really do not know what could be causing it, as your setup is pretty custom but I am very curious, please let us know when you find out. Am moving this thread to a discussion. |
Beta Was this translation helpful? Give feedback.
-
@mysticaltech Just to be clear, this happens on vanilla kube-hetzner, not on my custom implementation! Therefore, I think moving this to discussions is wrong. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Lets imagine we have external dns address, that we want to request from one of the virtual machines provided by hetzner(kubernetes node).
Request chain looks like this: (one of variants): [hetzner-vm-1] -> [dns] -> [hetzner LB/klipper LB] -> [ internal-network(created by this repo) ] -> [ traefik/nginx ] -> [ hetzner-vm-1/2/3 ] -> [ pod ] -> [ container ]
Problem:
request from the inside of the node with dns address pointed to this exact cluster takes 1.04 minutes to respond. As an example -
wget -S --spider https://{YOUR-DNS-HERE}/
from inside of the vm.Example close to actual usage - I'm hosting docker container registry in my cluster, I'm want my cluster to be able to download anything from this DCR using DNS. From any other place(my laptop, as example) - everything is ok. From inside the cluster CRIO gets timeout because it takes 1m4sec and timeout is hardcoded to 30 sec.
I have tried to contact with hetzner support - they respond that everything is ok. Problem started after 21.11.2022.
Nodes used
Control planes - cpx11 - 3
Agent - cpx21 - 1
Storage - cpx21 - 1
Repo version - latest master (9b4e813)
Beta Was this translation helpful? Give feedback.
All reactions