-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[k8s] Jupyterhub adressing notebook containters per IP breaks installs behind http proxy #872
Comments
If you set: hub:
config:
KubeSpawner:
services_enabled: true The URL for pods should use the service DNS name, not the ip address. You can also define a But I'm guessing we should probably be using a DNS name all the time. |
Thank you, updating the configuration made the Kube'Spawner actually use the service DNS names. Some component also accesses the Kubernetes API server by IP - I tested the proposed config change by removing all cluster-internal IP addresses from the NO_PROXY list, but it still wouldn't work. I then noticed in the Squid proxy log the access to 10.43.0.1, which in this case is
Re-adding that IP to the NO_PROXY list got things working again (and thankfully it's just that single, static IP), but this might be addressed as well? |
talking to the api server is done with official kubernetes api clients via standard |
Thank you for the pointer - it indeed looks like that code is simply using the host from the env var KUBERNETES_SERVICE_HOST, which is Kubernetes-provided and is set with the IP address, instead of DNS name. We have a number of other pods running behind http proxies, so my current guess is that somehow using the aio library influences this, while other pods may use other libraries, capable of handling CIDR entries in no_proxy. From my current point of view, it'd be too much of a hassle to try to fix this in code, especially since adding that single IP to no_proxy solves the issue with a very reasonable amount of "work". Both this and using "services_enabled: true" when running in an environment with external HTTP proxies might be added to the docs, as especially "service_enabled" would have never caught my eye, had you not mentioned it. Granted, I wouldn't know where to put these bits of information, maybe in the descriptions in values.yaml (since this is purely Kubernetes-related)? Thank you again for your assistance - I'm unsure if you want to keep this issue open WRT to your comment above ("But I'm guessing we should probably be using a DNS name all the time.") - feel free to close, I'm a happy camper now. |
Bug description
Starting the actual notebook servers fails with a time-out message when running a JupyterHub installation on Kubernetes (image quay.io/jupyterhub/k8s-hub:3.3.8 installed via Helm chart jupyterhub:3.3.8 ) in a Kubernetes cluster that requires an external HTTP proxy to access resources outside the K8s.
Debugging shows that the hub tries to address the server pods URLs via (intra-cluster) IP addresses, instead of using DNS host names. NO_PROXY is set to exclude intra-cluster DNS domains from proxying, as well as CIDR ranges for intra-cluster IP addresses. But because the aio library used for HTTP queries does not support CIDR elements in NO_PROXY, the HTTP requests for the server pod status queries are sent to the external proxy, instead of going to the notebook server. The HTTP proxy responds with an error code (it has no way to reach intra-cluster IPs, hence these are blocked by the proxy configuration) and because of the contiuous errors while requesting the notebook status, the hub eventually reports the server to not have started successfully and kills the server instance.
If the distinct IPs of server instances are added to NO_PROXY explicitly, then starting the servers succeeds, as the hub's queries are sent to the server instance directly. But as the allocation of server IPs is dynamic, all possible IPs would have to be added to NO_PROXY individually, which is neither practical to administer nor expected to be performant.
Checking the hub's admin page, looking at the user's server's details, there's already an entry for the DNS name of the notebook server pod.
How to reproduce
Expected behaviour
The hub should communicate via the dynamically generated, cluster-internal DNS name of the server pod. As it is already displayed on the hub's admin page server details entry, it should already be available to the hub.
Would hub use the cluster-internal DNS name, aio would relate the server pod's name to the entry in no_proxy that is excluding any intra-cluster DNS hosts requests from going to the HTTP proxy.
Actual behaviour
As can be deducted from the HTTP proxy log, JupyterHub uses the IP of the server pod to request status information. The traffic is then redirected to the HTTP proxy, which cannot properly forward the traffic.
Your personal set up
Configuration
Details of the values.yaml can be made available - no parameter was obviously toggling IP vs DNS usage to access the server pods...The text was updated successfully, but these errors were encountered: