Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s] Jupyterhub adressing notebook containters per IP breaks installs behind http proxy #872

Open
jmozd opened this issue Oct 25, 2024 · 4 comments
Labels

Comments

@jmozd
Copy link

jmozd commented Oct 25, 2024

Bug description

Starting the actual notebook servers fails with a time-out message when running a JupyterHub installation on Kubernetes (image quay.io/jupyterhub/k8s-hub:3.3.8 installed via Helm chart jupyterhub:3.3.8 ) in a Kubernetes cluster that requires an external HTTP proxy to access resources outside the K8s.

Debugging shows that the hub tries to address the server pods URLs via (intra-cluster) IP addresses, instead of using DNS host names. NO_PROXY is set to exclude intra-cluster DNS domains from proxying, as well as CIDR ranges for intra-cluster IP addresses. But because the aio library used for HTTP queries does not support CIDR elements in NO_PROXY, the HTTP requests for the server pod status queries are sent to the external proxy, instead of going to the notebook server. The HTTP proxy responds with an error code (it has no way to reach intra-cluster IPs, hence these are blocked by the proxy configuration) and because of the contiuous errors while requesting the notebook status, the hub eventually reports the server to not have started successfully and kills the server instance.

If the distinct IPs of server instances are added to NO_PROXY explicitly, then starting the servers succeeds, as the hub's queries are sent to the server instance directly. But as the allocation of server IPs is dynamic, all possible IPs would have to be added to NO_PROXY individually, which is neither practical to administer nor expected to be performant.

Checking the hub's admin page, looking at the user's server's details, there's already an entry for the DNS name of the notebook server pod.

How to reproduce

  1. define the extra variables "http_proxy", "https_proxy" pointing to an HTTP proxy (i.e. Squid instance)
  2. define the extra variable "no_proxy" (or NO_PROXY) to exclude cluster DNS names and CIDRs of cluster IP ranges
  3. create the hub pod
  4. Try to start a new server instance, check K8s for its IP
  5. check the HTTP proxy access log for IP-based http requests and correlate with the IP of the server instance IP
  6. see the time-out error for the notebook server start reported on the hub webUI

Expected behaviour

The hub should communicate via the dynamically generated, cluster-internal DNS name of the server pod. As it is already displayed on the hub's admin page server details entry, it should already be available to the hub.

Would hub use the cluster-internal DNS name, aio would relate the server pod's name to the entry in no_proxy that is excluding any intra-cluster DNS hosts requests from going to the HTTP proxy.

Actual behaviour

As can be deducted from the HTTP proxy log, JupyterHub uses the IP of the server pod to request status information. The traffic is then redirected to the HTTP proxy, which cannot properly forward the traffic.

Your personal set up

  • OS: Kubernetes cluster (K3s v1.26.15, managed via Rancher
  • Version(s): JupyterHub 3.3.8 installed via Helm chart jupyterhub:3.3.8
Configuration Details of the values.yaml can be made available - no parameter was obviously toggling IP vs DNS usage to access the server pods...
@jmozd jmozd added the bug label Oct 25, 2024
@minrk
Copy link
Member

minrk commented Nov 12, 2024

If you set:

hub:
  config:
    KubeSpawner:
      services_enabled: true

The URL for pods should use the service DNS name, not the ip address.

You can also define a KubeSpawner.get_pod_url hook (requires writing Python code) to return a URL, given the Pod resource.

But I'm guessing we should probably be using a DNS name all the time.

@minrk minrk transferred this issue from jupyterhub/jupyterhub Nov 12, 2024
@jmozd
Copy link
Author

jmozd commented Nov 14, 2024

Thank you, updating the configuration made the Kube'Spawner actually use the service DNS names.

Some component also accesses the Kubernetes API server by IP - I tested the proposed config change by removing all cluster-internal IP addresses from the NO_PROXY list, but it still wouldn't work. I then noticed in the Squid proxy log the access to 10.43.0.1, which in this case is

> kubectl describe service/kubernetes
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.43.0.1
IPs:               10.43.0.1
Port:              https  443/TCP

Re-adding that IP to the NO_PROXY list got things working again (and thankfully it's just that single, static IP), but this might be addressed as well?

@minrk
Copy link
Member

minrk commented Nov 14, 2024

talking to the api server is done with official kubernetes api clients via standard load_incluster_config(). I don't think we pick the url to connect to, kubernetes does. But maybe there's an arg that would work? If you can share Python code that connects a kubernetes client as you would expect from inside the containers, we can give it a try.

@jmozd
Copy link
Author

jmozd commented Nov 15, 2024

Thank you for the pointer - it indeed looks like that code is simply using the host from the env var KUBERNETES_SERVICE_HOST, which is Kubernetes-provided and is set with the IP address, instead of DNS name.

We have a number of other pods running behind http proxies, so my current guess is that somehow using the aio library influences this, while other pods may use other libraries, capable of handling CIDR entries in no_proxy.

From my current point of view, it'd be too much of a hassle to try to fix this in code, especially since adding that single IP to no_proxy solves the issue with a very reasonable amount of "work". Both this and using "services_enabled: true" when running in an environment with external HTTP proxies might be added to the docs, as especially "service_enabled" would have never caught my eye, had you not mentioned it. Granted, I wouldn't know where to put these bits of information, maybe in the descriptions in values.yaml (since this is purely Kubernetes-related)?

Thank you again for your assistance - I'm unsure if you want to keep this issue open WRT to your comment above ("But I'm guessing we should probably be using a DNS name all the time.") - feel free to close, I'm a happy camper now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants