Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support random port assignment when c.KubeSpawner.port = 0 #299

Open
qzchenwl opened this issue Feb 21, 2019 · 16 comments · May be fixed by #448
Open

Support random port assignment when c.KubeSpawner.port = 0 #299

qzchenwl opened this issue Feb 21, 2019 · 16 comments · May be fixed by #448

Comments

@qzchenwl
Copy link

qzchenwl commented Feb 21, 2019

According to the document, KubeSpawner will use randomly allocated port.
I deployed zero-to-jupyterhub-k8s (hostNetwork:true set for spark) and got error when login some users:
[Warning] 0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 Insufficient memory, 2 node(s) didn't have free ports for the requested pod ports.

That's because KubeSpawner always use port 8888 instead of random port. https://github.com/jupyterhub/kubespawner/blob/master/kubespawner/spawner.py#L145

https://jupyterhub-kubespawner.readthedocs.io/en/latest/spawner.html

config c.KubeSpawner.port = Int(0)
The port for single-user servers to listen on.

Defaults to 0, which uses a randomly allocated port number each time.

If set to a non-zero value, all Spawners will use the same port, which only makes sense if each server is on a different address, e.g. in containers.

New in version 0.7.

@frouzbeh
Copy link

@qzchenwl Hi, have you resolved this issue yet? here they suggested to use '--port=%i' % port for docker command. I was thinking maybe we can pass it to hub's extra config so it chooses the port randomly.

@qzchenwl
Copy link
Author

@frouzbeh Not yet. The solution for DockerSpawner is not suitable for KubeSpawner, because Docker Spawner runs at the local machine, it can get usable port before it starts. For KubeSpawner, you don't know which host the container will be assigned, hence you don't know which port is usable beforehand.

@frouzbeh
Copy link

@qzchenwl Well that's an issue and I hope somebody will take care of it.

@frouzbeh
Copy link

frouzbeh commented Apr 9, 2019

@minrk, @yuvipanda Hi, don't you have any comment or solution on this issue?

@yuvipanda
Copy link
Collaborator

Traditionally, when JupyterHub tries to find a 'random port', it finds a random port that is unused in the machine the JupyterHub process is running. That doesn't work here, since you'll need to find a random available port that isn't used in any of the machines running. I'm not entirely sure how to do that in a clean way.

Are there ways to run spark that don't require hostNetwork? Can the pod network range be directly reachable from spark?

@frouzbeh
Copy link

frouzbeh commented Apr 9, 2019

@yuvipanda thanks,
I'm not expert on Spark, but I can ask our administrator to see if it's possible, but can't we provide a range of ports so kubespawner selects randomly from that range?

@ramkrishnan8994
Copy link

ramkrishnan8994 commented Apr 10, 2019

Traditionally, when JupyterHub tries to find a 'random port', it finds a random port that is unused in the machine the JupyterHub process is running. That doesn't work here, since you'll need to find a random available port that isn't used in any of the machines running. I'm not entirely sure how to do that in a clean way.

Are there ways to run spark that don't require hostNetwork? Can the pod network range be directly reachable from spark?

@yuvipanda - Spark does not have a requirement that it needs to run on hostNetwork. But the Images in Docker Stacks of Jupyterhub has a requirement.

Caused by: java.io.IOException: Failed to connect to jupyter-doe-xxxxx:39003 Caused by: java.net.UnknownHostException: jupyter-doe-xxxxx

This is the error that spark throws when it tries to run a job when we disable hostNetwork.
The jupyter-doe-xxxxx pod is basically the pod that is generated for the user. Since our Spark cluster also runs on K8s and since the jupyterhub pod is not in hostNetowork, it is not able to resolve the pod.

Can the jupyter-doe-xxxxx pod be made a Statefulset. We've generally seen these types of issue being solved. Not sure if it can be solved. But worth a try.

@rkdarst
Copy link

rkdarst commented Apr 10, 2019

by the way, a similar patch recently got accepted by batchspawner:
jupyterhub/batchspawner#58
which has created some problems when interacting with other features, you can see issues in batchspawner. A similar thing could be used here.

But... in case that is used here, it may be time to add native support for this in JupyterHub. I think that would solve some of the subtle issues which we keep seeing... but I'm not able to do it myself.

Note to @cmd-ntrf who wrote it originally.

@cmd-ntrf
Copy link

As @rkdarst mentionned, we encountered a similar issue with batchspawner.

The solution we opted for was to write a API Handler that is installed on the Hub side. The handler waits to receive the port number from the singleuser and modify the spawner port value. The spawner is identified based on the user auth, but I have recently submitted a patch to use the API token instead to support named servers.

To send the port, I have written a small wrapper script that selects a port, configures the singleuser to use it, send it through http to the Hub at the API handler address, then starts the notebook just like singleuser would.

There is a problem though. JupyterHub does not provide mechanism to automatically registers API handlers from third-parties. Currently, the API handler is registered when the batchspawner module is imported, but for some cases like when using wrapspawner, the module is imported after JupyterHub is initialized and the batchspawner API handler is not registered properly. As a solution to that problem, we currently instructing user to import batchspawner in jupyterhub_config.py, which is not ideal, but it works.

Ideally, the API handler I have written for batchspawner would be integrated directly in JupyterHub to configure the port number. Another option would be to implement a mechanism similar to the one in Jupyter that allow the installation and activation of server side plugin / handlers. I am willing to help with either solution or anything related.

@frouzbeh
Copy link

@ramkrishnan8994 I haven't been able to connect to our Spark yarn cluster yet and I'm getting the following exception for both cases (enabling hostNetwork and disabling it):

org.apache.spark.SparkException: Exception thrown in awaitResult:
        at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
        at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
        at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:514)
        at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:307)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply$mcV$sp(ApplicationMaster.scala:245)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$1.apply(ApplicationMaster.scala:245)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:773)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:772)
        at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:244)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:797)
        at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:827)
        at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.io.IOException: Failed to connect to localhost/127.0.0.1:46086
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
        at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
        at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
        at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:46086
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:323)
        at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
        ... 1 more
Caused by: java.net.ConnectException: Connection refused
        ... 11 more

For both cases, I can see my application has been accepted in yarn manager, and then after some seconds it's stopped.

@frouzbeh
Copy link

@cmd-ntrf Apparently none of the developers are interested in resolving this issue, would you please give me some guidance about your solution? Can you share your solution with me? Thanks

@ramkrishnan8994
Copy link

@frouzbeh - Any solutions you were able to come up for this? We have dropped Jupyterhub because of this.

@rkdarst

@frouzbeh
Copy link

@ramkrishnan8994 Well, that's crazy, because I thought I have to use host net, but without host net my spark works fine and now we don't have the port problem.

@qzchenwl
Copy link
Author

@frouzbeh How do you make spark work without host net?

@ramkrishnan8994
Copy link

@ramkrishnan8994 Well, that's crazy, because I thought I have to use host net, but without host net my spark works fine and now we don't have the port problem.

Are you connecting to local spark or Remote Spark. We connect to a Standalone Spark cluster and that requires hostNetwork to be enabled

@frouzbeh
Copy link

@ramkrishnan8994 My Kubernetes and Hadoop cluster are physically on the same computer cluster. I thought to connect to spark from client side I needed hostNetwork but I don't. I just needed to set the spark.driver.host of SparkConf to the ip address of the container.

@consideRatio consideRatio changed the title KubeSpawner should choose a random port when c.KubeSpawner.port = 0 Support random port assignment when c.KubeSpawner.port = 0 Oct 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants