You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
batchspawner-singleuser starts correctly on the compute node and communicates back to the hub to indicate this. JupyterHub shows:
Server ready at [/user/username/]
and redirects to /user/username/lab?
but this then shows a
503 : Service Unavailable
Your server appears to be down. Try restarting it [from the hub]
to the user.
There are some complexities with the networking (described below) that are likely related to the problem I'm seeing and it could also be a configuration issue. However given I'm seeing an exception I thought it best to report it as a bug.
Expected behaviour
The user is presented the JupyterHub session from the compute node.
Actual behaviour
The user is presented with a 503 error. The singleuser process on the compute node generates an uncaught exception traceback (see logs) but continues to run until the job is cancelled.
How to reproduce
Login to JupyterHub.
Select a job profile and click start.
Job waits in queue, JupyterHub shows job status to user.
Job starts, JupyterHub shows "waiting to connect"
JupyterHub shows "Server ready...."
Error shown
Your personal set up
JupyterHub has been manually installed on the server using miniconda with packages installed from condaforge and pypi.
Networking is dual stack (ipv4/ipv6) with some complexities:
User access to the Hub is over ipv4 and ipv6.
Hub connectivity to PBSPro is over ipv4.
Communication between the hub and the compute nodes needs to be over ipv6 as only this is routable between the JupyterHub server and compute nodes. However DNS has both ipv4 and ipv6 entries for the compute nodes and when JupyterHub was resolving the IP address of the compute nodes from exechosts regular expression match, it was resolving the ipv4 address. To work around this I wrote a wrapper script around qstat to replace the hostname (in the output from qstat) with the ipv6 address; there is probably a better way of doing this but it seems to work up until I reach the problem being reported.
OS: JupyterHub is running on CentOS 7.9, the compute nodes are running RHEL 8.5
c.Application.log_level='DEBUG'c.JupyterHub.cookie_secret_file='/var/jupyterhub/jupyterhub_cookie_secret'c.JupyterHub.data_files_path='/opt/jupyterhub/miniconda/2022-04-10/share/jupyterhub'c.JupyterHub.db_url='sqlite:////var/jupyterhub/jupyterhub.sqlite'c.ConfigurableHTTPProxy.debug=Truec.JupyterHub.hub_connect_ip='hub:ipv6:address'c.JupyterHub.hub_ip='hub:ipv6:address'c.JupyterHub.ip=''c.JupyterHub.log_level='DEBUG'c.ConfigurableHTTPProxy.pid_file='/var/jupyterhub/jupyterhub-proxy.pid'c.Spawner.debug=Truec=get_config()
c.JupyterHub.spawner_class="wrapspawner.ProfilesSpawner"c.Spawner.http_timeout=6000c.Spawner.start_timeout=6000importbatchspawnerimportbatchspawner.apic.PBSSpawner.batch_script=''' #!/bin/sh#PBS -l walltime={runtime}#PBS -l select=1:ncpus={nprocs}:mem={memory}:ngpus={ngpus}:mpiprocs=1:ompthreads={nprocs}#PBS -N jupyterhub#PBS -v {keepvars}#PBS -q {queue}export RELEASE=2022-04-10export PATH=/apps/jupyterhub/$RELEASE/bin:/bin:/usr/bin:/sbin:/usr/sbin:/opt/pbs/default/bin:/usr/local/binprintenvset -xecho "{cmd}" > $HOME/.jupyterhub-lab.stdout 2>&1cd $HOME{cmd} --ip="::"echo $?'''c.BatchSpawnerBase.req_nprocs='4'c.BatchSpawnerBase.req_runtime='25:0:0'c.BatchSpawnerBase.req_memory='4gb'c.BatchSpawnerBase.req_ngpus='0'c.BatchSpawnerBase.req_queue='interactive'c.BatchSpawnerBase.batch_submit_cmd='/opt/jupyterhub/pbs/bin/__qsub'c.BatchSpawnerBase.batch_cancel_cmd='/opt/jupyterhub/pbs/bin/__qdel {job_id}'c.BatchSpawnerBase.batch_query_cmd='/opt/jupyterhub/pbs/bin/__qstat {job_id}'c.BatchSpawnerBase.state_exechost_re='exec_host = (.+)'c.BatchSpawnerBase.state_pending_re='job_state = [QH]'c.BatchSpawnerBase.state_running_re='job_state = R'c.ProfilesSpawner.profiles= [
( "1 cores 8GB 8 hours", 'c1-1c-8g-8h', 'batchspawner.PBSSpawner',
{ "req_nprocs" : "1", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"8gb", "req_ngpus":"0" } ),
( "4 cores, 32GB, 8 hours", "c1-4c-16g-8h", "batchspawner.PBSSpawner",
{ "req_nprocs" : "4", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"32gb", "req_ngpus":"0" } ),
( "8 cores, 64GB, 8 hours", "c1-8c-64g-8h", "batchspawner.PBSSpawner",
{ "req_nprocs" : "8", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"64gb", "req_ngpus":"0" } ),
( "2 cores, 16GB, 8 hours, 1 GPU", "c1-2c-16g-8h-1gpu", "batchspawner.PBSSpawner",
{ "req_nprocs" : "2", "req_queue":"gpu", "req_runtime":"08:00:00", "req_memory":"16gb", "req_ngpus":"1" } ),
]
fromjupyter_saml2authenticatorimportSaml2AuthenticatorfromtraitletsimportBoolclassCheckingSaml2Authenticator(Saml2Authenticator):
defvalidate_username(self, username):
ifnotsuper().validate_username(username):
returnFalse"""Check if the user exists on the system"""importpwdtry:
pwd.getpwnam(username)
exceptKeyError:
returnFalseelse:
returnTrue# For some reason this is not configurable in the parent classdelete_invalid_users=Bool(
default_value=True,
config=True,
help="Whether to delete users (from JupyterHub DB) who no longer validate",
)
c.JupyterHub.authenticator_class=CheckingSaml2Authenticatorc.CheckingSaml2Authenticator.saml2_metadata_url='REDACTED'c.CheckingSaml2Authenticator.saml2_entity_id='REDACTED'c.CheckingSaml2Authenticator.saml2_attribute_username='name'c.CheckingSaml2Authenticator.delete_invalid_users=Truec.CheckingSaml2Authenticator.login_service='ID'
Logs
From the hub logs (I've redacted some items - hopefully not so much it's impossible to work out what is going on)
[I 2022-04-10 14:01:45.365 JupyterHub log:189] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-username&redirect_uri=%2Fuser%2Fusername%2Foauth_callback&response_type=code&state=[secret] -> /user/username/oauth_callback?code=[secret]&state=[secret] (username@::ffff:user.ipv4.address) 54.29ms
14:01:45.938 [ConfigProxy] debug: PROXY WEB /user/username/oauth_callback?code=redacted to http://[compute:ipv6:address]:40713
[D 2022-04-10 14:01:45.961 JupyterHub scopes:301] Authenticated with token <APIToken('REDACTED...', user='username', client_id='jupyterhub')>
[D 2022-04-10 14:01:45.962 oauthlib.oauth2.rfc6749.endpoints.token token:112] Dispatching grant_type authorization_code request to <oauthlib.oauth2.rfc6749.grant_types.authorization_code.AuthorizationCodeGrant object at REDACTED>.
[D 2022-04-10 14:01:45.962 JupyterHub provider:53] authenticate_client <oauthlib.Request SANITIZED>
[D 2022-04-10 14:01:45.974 oauthlib.oauth2.rfc6749.grant_types.authorization_code authorization_code:534] Using provided redirect_uri /user/username/oauth_callback
[D 2022-04-10 14:01:45.975 JupyterHub provider:112] confirm_redirect_uri: client_id=jupyterhub-user-username, redirect_uri=/user/username/oauth_callback
[D 2022-04-10 14:01:45.975 oauthlib.oauth2.rfc6749.grant_types.authorization_code authorization_code:302] Token request validation ok for <oauthlib.Request SANITIZED>.
[D 2022-04-10 14:01:45.975 JupyterHub provider:335] Saving bearer token {'access_token': 'REDACTED', 'expires_in': 1209600, 'token_type': 'Bearer', 'scope': '', 'refresh_token': 'REDACTED'}
[D 2022-04-10 14:01:46.000 JupyterHub provider:195] Deleting oauth code 4Ef... for jupyterhub-user-username
[I 2022-04-10 14:01:46.020 JupyterHub log:189] 200 POST /hub/api/oauth2/token (username@compute:ipv6:address) 64.17ms
[D 2022-04-10 14:01:46.036 JupyterHub base:281] Recording first activity for <APIToken('OSFd...', user='username', client_id='jupyterhub-user-username')>
[D 2022-04-10 14:01:46.055 JupyterHub scopes:301] Authenticated with token <APIToken('OSFd...', user='username', client_id='jupyterhub-user-username')>
[I 2022-04-10 14:01:46.066 JupyterHub log:189] 200 GET /hub/api/user (username@compute:ipv6:address) 32.94ms
14:01:46.637 [ConfigProxy] debug: PROXY WEB /user/username/lab? to http://[compute:ipv6:address]:40713
14:01:46.850 [ConfigProxy] error: 503 GET /user/username/lab? socket hang up
[D 2022-04-10 14:01:46.857 JupyterHub pages:586] No template for 503
[I 2022-04-10 14:01:46.872 JupyterHub log:189] 200 GET /hub/error/503?url=%2Fuser%2Fusername%2Flab%3F (@hub:ipv6:address) 16.33ms
and from the compute node:
[D 2022-04-10 14:01:46.062 SingleUserLabApp auth:395] Received request from Hub user {'admin': True, 'groups': [], 'name': 'username', 'kind': 'user', 'session_id': 'redacted', 'scopes': ['access:servers!server=username/']}
[I 2022-04-10 14:01:46.062 SingleUserLabApp auth:1132] Logged-in user {'admin': True, 'groups': [], 'name': 'username', 'kind': 'user', 'session_id': 'redacted', 'scopes': ['access:servers!server=username/']}
[D 2022-04-10 14:01:46.062 SingleUserLabApp auth:841] Setting oauth cookie for ::ffff:user.ipv4.address: jupyterhub-user-username, {'path': '/user/username/', 'httponly': True, 'secure': True}
[I 2022-04-10 14:01:46.063 SingleUserLabApp log:189] 302 GET /user/username/oauth_callback?code=[secret]&state=[secret] -> /user/username/lab? (@::ffff:user.ipv4.address) 119.27ms
[E 2022-04-10 14:01:46.643 SingleUserLabApp http1connection:67] Uncaught exception
Traceback (most recent call last):
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/http1connection.py", line 273, in _read_message
delegate.finish()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/httpserver.py", line 387, in finish
self.delegate.finish()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/routing.py", line 268, in finish
self.delegate.finish()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 2290, in finish
self.execute()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 2309, in execute
self.handler = self.handler_class(
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 227, in __init__
self.clear()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 328, in clear
self.set_default_headers()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyter_server/base/handlers.py", line 314, in set_default_headers
elif self.token_authenticated and "Access-Control-Allow-Origin" not in self.settings.get(
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyter_server/base/handlers.py", line 159, in token_authenticated
return self.login_handler.is_token_authenticated(self)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/singleuser/mixins.py", line 105, in is_token_authenticated
handler.get_current_user()
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 1051, in get_current_user
self._hub_auth_user_cache = self.check_hub_user(user_model)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 978, in check_hub_user
if self.allow_all:
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 924, in allow_all
self.hub_scopes is None
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 916, in hub_scopes
return self.hub_auth.oauth_scopes or None
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 577, in __get__
return self.get(obj, cls)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 540, in get
default = obj.trait_defaults(self.name)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 1580, in trait_defaults
return self._get_trait_default_generator(names[0])(self)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 977, in __call__
return self.func(*args, **kwargs)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 357, in _default_scopes
return set(json.loads(env_scopes))
File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
The text was updated successfully, but these errors were encountered:
Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋
Bug description
batchspawner-singleuser starts correctly on the compute node and communicates back to the hub to indicate this. JupyterHub shows:
Server ready at [/user/username/]
and redirects to /user/username/lab?
but this then shows a
to the user.
There are some complexities with the networking (described below) that are likely related to the problem I'm seeing and it could also be a configuration issue. However given I'm seeing an exception I thought it best to report it as a bug.
Expected behaviour
The user is presented the JupyterHub session from the compute node.
Actual behaviour
The user is presented with a 503 error. The singleuser process on the compute node generates an uncaught exception traceback (see logs) but continues to run until the job is cancelled.
How to reproduce
Your personal set up
JupyterHub has been manually installed on the server using miniconda with packages installed from condaforge and pypi.
Authentication to JupyterHub uses SAML Authenticator from https://github.com/ImperialCollegeLondon/jupyter_saml2authenticator.
Cluster is running PBSPro.
Networking is dual stack (ipv4/ipv6) with some complexities:
qstat
to replace the hostname (in the output from qstat) with the ipv6 address; there is probably a better way of doing this but it seems to work up until I reach the problem being reported.OS: JupyterHub is running on CentOS 7.9, the compute nodes are running RHEL 8.5
Versions:
Full environment
Configuration
Logs
From the hub logs (I've redacted some items - hopefully not so much it's impossible to work out what is going on)
and from the compute node:
The text was updated successfully, but these errors were encountered: