-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jupyterhub cannot talk to session started in Slurm #231
Comments
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗 |
Firewall perhaps? Have you looked at the state of configurable-http-proxy to make sure the proper route is there. Also, it's been a while, but maybe get rid of the |
Hello - is the provided configuration sample the literal content of your config.py or similar? If so, it appears that you are overriding the Another possible area of concern - in your logs, I see the server call itself |
@jbaksta firewall is not a problem, I can reach the remote session from the jupyterhub server using lynx. I've removed my sbatch script but srun still comes from batchspawner.py . |
@mbmilligan the server names were omitted (except for inhccne0101 you've found). Apart from that the config is what I have, yes. |
@pelacables you are right, we would like to have this better documented. Referencing what we have, the key parts are
Putting those together the result is that your |
thanks @mbmilligan . My conf now looks like:
But I still have the conenction error. In jupyterhub I see:
(computenode is the real node name and domain.com is company's name) jupyterhub log:
and the job log can be found here Firewall is not an issue, I can talk to the compute node and from the compute node to the server on each respective ports. anything I can do to increase the verbosity? already have "c.Spawner.args = ['--debug' ]" in the config file. How can I check the routes? Maybe it is not being added as @jbaksta mentioned |
I would absolutely double check your proxy routes to make sure it's correct. Your job logs and server logs show same port, so I'd certainly be curious if the route in the proxy is correct. |
@jbaksta you are right, the new route is not added:
what can be the reason? how to debug? |
Maybe just start with this in your hub configuration file |
You can increase the logging verbosity of the Hub via the configuration file, like so:
|
I do not see JupyterHub adding any route at all. I just asks for the routes all the time:
if I create a local session I do see a new route in configurable http proxy. who/what is telling jupyterhub to add the route? is it the job once it starts?
sorry @jbaksta how can I ask jupyterhub about the routes? the link you gave me shows errors and I do not find anything in regards routes in here |
The routes are added by the hub. In batchspawner, when a job starts, there is a call to make a
All of this being said, I'm still running JupyterHub 1.5.0 and haven't migrated to the JupyterHub 2.0 series. Maybe batchspawner has a bug with JupyterHub 2.0+. All in all, the basic idea for starting is:
As far as debugging this further, again, inspect the routes inside the configurable-http-proxy itself. Read this. Also, query the routes from JupyterHub. You'll need a token with admin permissions IIRC. I'm not exactly sure that means for JupyterHub 2.0 scopes yet. If you don't believe it's getting the routes, you might have to use the Python debugger to inspect some state and make sure you see a post to the CHP or add more logging in places, but the messages are usually sufficient. |
FWIW I don't think the problem is with the proxy setup. The Hub won't add the proxy route until the notebook/lab server is confirmed running, and that It does appear that the server is successfully making connections back to the Hub, but it's worthwhile to verify that connections in the other direction work (e.g. in some clusters only outbound connections from compute nodes are allowed). I would suggest testing this: from the Hub node, once the message |
if I use fqdn it works. There's a proxy that is blocking the connection! (I tried with lynx in the past and, after some errors, it managed to access it, I guess the proxy is adding the domain at some point. That's why I mentined that I could reach the jupyter server from the hub server. My fault on not using curl) is there a way to tell jupyterhub to use fqdn? because the proxy is set in many places and will be hard to unset it everywhere (And I'm not the admin for the computenodes) Already tried passing c.JupyterHub.subdomain_host = 'domain.com' but it keeps using nfqdn. |
Oh, good, we have a straightforward solution for that. The Hub blindly uses the hostname as reported by the spawner, and BatchSpawner uses the scheduler query mechanism (
|
great!! now it's working! thanks a lot @mbmilligan. I close this issue. |
Bug description
BatchSpawner is able to submit the job, the job starts in the compute node but the Hub never connects to the session in the compute node.
Expected behaviour
The Hub connects to the session running in the compute node
Actual behaviour
The Hub stucks in "Cluster job running... waiting to connect"
How to reproduce
Spawn failed: Server at http://nodeXYZ:52303/user/[email protected]/local/ didn't respond in 300 seconds
Your personal set up
Using an environment module based installation.
Batch system: SLURM
wrapspawner.ProfilesSpawner to be able to choose between local and cluster sessions
Auth used: OAuth (OpenID)
In sudoers I've commented out secure_path
OS: SL 7.9
Packages:
Python 3.9.5
Jupyterhub 2.1.1
Batchspawner 1.1.0
Jupyterlab 3.2.8
Jupyter-sever 1.13.4
Full environment
Configuration
Logs
Server logs:
Client logs:
The text was updated successfully, but these errors were encountered: