-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support random port assignment when c.KubeSpawner.port = 0 #299
Comments
@frouzbeh Not yet. The solution for DockerSpawner is not suitable for KubeSpawner, because Docker Spawner runs at the local machine, it can get usable port before it starts. For KubeSpawner, you don't know which host the container will be assigned, hence you don't know which port is usable beforehand. |
@qzchenwl Well that's an issue and I hope somebody will take care of it. |
@minrk, @yuvipanda Hi, don't you have any comment or solution on this issue? |
Traditionally, when JupyterHub tries to find a 'random port', it finds a random port that is unused in the machine the JupyterHub process is running. That doesn't work here, since you'll need to find a random available port that isn't used in any of the machines running. I'm not entirely sure how to do that in a clean way. Are there ways to run spark that don't require hostNetwork? Can the pod network range be directly reachable from spark? |
@yuvipanda thanks, |
@yuvipanda - Spark does not have a requirement that it needs to run on hostNetwork. But the Images in Docker Stacks of Jupyterhub has a requirement.
This is the error that spark throws when it tries to run a job when we disable hostNetwork. Can the jupyter-doe-xxxxx pod be made a Statefulset. We've generally seen these types of issue being solved. Not sure if it can be solved. But worth a try. |
by the way, a similar patch recently got accepted by batchspawner: But... in case that is used here, it may be time to add native support for this in JupyterHub. I think that would solve some of the subtle issues which we keep seeing... but I'm not able to do it myself. Note to @cmd-ntrf who wrote it originally. |
As @rkdarst mentionned, we encountered a similar issue with batchspawner. The solution we opted for was to write a API Handler that is installed on the Hub side. The handler waits to receive the port number from the singleuser and modify the spawner port value. The spawner is identified based on the user auth, but I have recently submitted a patch to use the API token instead to support named servers. To send the port, I have written a small wrapper script that selects a port, configures the singleuser to use it, send it through http to the Hub at the API handler address, then starts the notebook just like singleuser would. There is a problem though. JupyterHub does not provide mechanism to automatically registers API handlers from third-parties. Currently, the API handler is registered when the batchspawner module is imported, but for some cases like when using wrapspawner, the module is imported after JupyterHub is initialized and the batchspawner API handler is not registered properly. As a solution to that problem, we currently instructing user to import batchspawner in jupyterhub_config.py, which is not ideal, but it works. Ideally, the API handler I have written for batchspawner would be integrated directly in JupyterHub to configure the port number. Another option would be to implement a mechanism similar to the one in Jupyter that allow the installation and activation of server side plugin / handlers. I am willing to help with either solution or anything related. |
@ramkrishnan8994 I haven't been able to connect to our Spark yarn cluster yet and I'm getting the following exception for both cases (enabling hostNetwork and disabling it):
For both cases, I can see my application has been accepted in yarn manager, and then after some seconds it's stopped. |
@cmd-ntrf Apparently none of the developers are interested in resolving this issue, would you please give me some guidance about your solution? Can you share your solution with me? Thanks |
@ramkrishnan8994 Well, that's crazy, because I thought I have to use host net, but without host net my spark works fine and now we don't have the port problem. |
@frouzbeh How do you make spark work without host net? |
Are you connecting to local spark or Remote Spark. We connect to a Standalone Spark cluster and that requires hostNetwork to be enabled |
@ramkrishnan8994 My Kubernetes and Hadoop cluster are physically on the same computer cluster. I thought to connect to spark from client side I needed hostNetwork but I don't. I just needed to set the spark.driver.host of SparkConf to the ip address of the container. |
According to the document, KubeSpawner will use randomly allocated port.
I deployed zero-to-jupyterhub-k8s (
hostNetwork:true
set for spark) and got error when login some users:[Warning] 0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient memory, 2 node(s) didn't have free ports for the requested pod ports.
That's because KubeSpawner always use port 8888 instead of random port. https://github.com/jupyterhub/kubespawner/blob/master/kubespawner/spawner.py#L145
https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html
The text was updated successfully, but these errors were encountered: