Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uncaught exception when JupyterHub tries to communicate with SingleUserLabApp on compute node #242

Open
dithwick opened this issue Apr 10, 2022 · 1 comment
Labels

Comments

@dithwick
Copy link

Bug description

batchspawner-singleuser starts correctly on the compute node and communicates back to the hub to indicate this. JupyterHub shows:

Server ready at [/user/username/]

and redirects to /user/username/lab?

but this then shows a

503 : Service Unavailable
Your server appears to be down. Try restarting it [from the hub]

to the user.

There are some complexities with the networking (described below) that are likely related to the problem I'm seeing and it could also be a configuration issue. However given I'm seeing an exception I thought it best to report it as a bug.

Expected behaviour

The user is presented the JupyterHub session from the compute node.

Actual behaviour

The user is presented with a 503 error. The singleuser process on the compute node generates an uncaught exception traceback (see logs) but continues to run until the job is cancelled.

How to reproduce

  1. Login to JupyterHub.
  2. Select a job profile and click start.
  3. Job waits in queue, JupyterHub shows job status to user.
  4. Job starts, JupyterHub shows "waiting to connect"
  5. JupyterHub shows "Server ready...."
  6. Error shown

Your personal set up

  • JupyterHub has been manually installed on the server using miniconda with packages installed from condaforge and pypi.

  • Authentication to JupyterHub uses SAML Authenticator from https://github.com/ImperialCollegeLondon/jupyter_saml2authenticator.

  • Cluster is running PBSPro.

  • Networking is dual stack (ipv4/ipv6) with some complexities:

    • User access to the Hub is over ipv4 and ipv6.
    • Hub connectivity to PBSPro is over ipv4.
    • Communication between the hub and the compute nodes needs to be over ipv6 as only this is routable between the JupyterHub server and compute nodes. However DNS has both ipv4 and ipv6 entries for the compute nodes and when JupyterHub was resolving the IP address of the compute nodes from exechosts regular expression match, it was resolving the ipv4 address. To work around this I wrote a wrapper script around qstat to replace the hostname (in the output from qstat) with the ipv6 address; there is probably a better way of doing this but it seems to work up until I reach the problem being reported.
  • OS: JupyterHub is running on CentOS 7.9, the compute nodes are running RHEL 8.5

  • Versions:

    • Python: python 3.9.7
    • JupyterHub: jupyterhub 2.2.2
    • BatchSpawner: batchspawner 1.1.0
    • wrapspawner 1.0.1
    • jupyter-saml2authenticator 0.1.0.dev0
Full environment
# packages in environment at /opt/jupyterhub/miniconda/2022-04-10:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
_openmp_mutex             4.5                       1_gnu
alembic                   1.7.7              pyhd8ed1ab_0    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     21.4.0             pyhd8ed1ab_0    conda-forge
batchspawner              1.1.0                    pypi_0    pypi
blinker                   1.4                        py_1    conda-forge
brotlipy                  0.7.0           py39h27cfd23_1003
c-ares                    1.18.1               h7f8727e_0
ca-certificates           2021.10.8            ha878542_0    conda-forge
certifi                   2021.10.8        py39hf3d152e_2    conda-forge
certipy                   0.1.3                      py_0    conda-forge
cffi                      1.15.0           py39hd667e15_1
charset-normalizer        2.0.4              pyhd3eb1b0_0
conda                     4.12.0           py39hf3d152e_0    conda-forge
conda-content-trust       0.1.1              pyhd3eb1b0_0
conda-package-handling    1.7.3            py39h27cfd23_1
configurable-http-proxy   4.0.1                  node14_0
cryptography              36.0.0           py39h9ce1e76_0
defusedxml                0.7.1                    pypi_0    pypi
elementpath               2.5.0                    pypi_0    pypi
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
icu                       68.1                 h2531618_0
idna                      3.3                pyhd3eb1b0_0
importlib-metadata        4.11.3           py39hf3d152e_1    conda-forge
importlib_resources       5.6.0              pyhd8ed1ab_0    conda-forge
jinja2                    3.1.1              pyhd8ed1ab_0    conda-forge
jsonschema                4.4.0              pyhd8ed1ab_0    conda-forge
jupyter-saml2authenticator 0.1.0.dev0               pypi_0    pypi
jupyter_telemetry         0.1.0              pyhd8ed1ab_1    conda-forge
jupyterhub                2.2.2              pyhd8ed1ab_1    conda-forge
jupyterhub-base           2.2.2              pyhd8ed1ab_1    conda-forge
krb5                      1.19.2               hac12032_0
ld_impl_linux-64          2.35.1               h7274673_9
libcurl                   7.80.0               h0b77cf5_0
libedit                   3.1.20210910         h7f8727e_0
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.3                  he6710b0_2
libgcc-ng                 9.3.0               h5101ec6_17
libgomp                   9.3.0               h5101ec6_17
libnghttp2                1.46.0               hce63b2e_0
libssh2                   1.9.0                h1ba5d50_1
libstdcxx-ng              9.3.0               hd4cf53a_17
libuv                     1.40.0               h7b6447c_0
mako                      1.2.0              pyhd8ed1ab_1    conda-forge
markupsafe                2.0.1            py39h3811e60_0    conda-forge
ncurses                   6.3                  h7f8727e_2
nodejs                    14.8.0               hda19d22_0
oauthlib                  3.2.0              pyhd8ed1ab_0    conda-forge
openssl                   1.1.1n               h7f8727e_0
packaging                 21.3               pyhd8ed1ab_0    conda-forge
pamela                    1.0.0                      py_0    conda-forge
pip                       21.2.4           py39h06a4308_0
prometheus_client         0.14.1             pyhd8ed1ab_0    conda-forge
psutil                    5.8.0            py39h27cfd23_1
pycosat                   0.6.3            py39h27cfd23_0
pycparser                 2.21               pyhd3eb1b0_0
pycurl                    7.44.1           py39h8f2d780_1
pyjwt                     2.3.0              pyhd8ed1ab_1    conda-forge
pyopenssl                 21.0.0             pyhd3eb1b0_1
pyparsing                 3.0.7              pyhd8ed1ab_0    conda-forge
pyrsistent                0.18.0           py39heee7806_0
pysaml2                   7.1.2                    pypi_0    pypi
pysocks                   1.7.1            py39h06a4308_0
python                    3.9.7                h12debd9_1
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python-json-logger        2.0.1              pyh9f0ad1d_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2022.1                   pypi_0    pypi
readline                  8.1.2                h7f8727e_1
requests                  2.27.1             pyhd3eb1b0_0
ruamel.yaml               0.16.12          py39h27cfd23_1
ruamel.yaml.clib          0.2.6            py39h7f8727e_0
ruamel_yaml               0.15.100         py39h27cfd23_0
setuptools                58.0.4           py39h06a4308_0
six                       1.16.0             pyhd3eb1b0_0
sqlalchemy                1.3.23           py39h3811e60_0    conda-forge
sqlite                    3.37.0               hc218d9a_0
tk                        8.6.11               h1ccaba5_0
tornado                   6.1              py39h3811e60_1    conda-forge
tqdm                      4.62.3             pyhd3eb1b0_1
traitlets                 5.1.1              pyhd8ed1ab_0    conda-forge
tzdata                    2021e                hda174b7_0
urllib3                   1.26.7             pyhd3eb1b0_0
wheel                     0.37.1             pyhd3eb1b0_0
wrapspawner               1.0.1                    pypi_0    pypi
xmlschema                 1.10.0                   pypi_0    pypi
xz                        5.2.5                h7b6447c_0
yaml                      0.2.5                h7b6447c_0
zipp                      3.8.0              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11               h7f8727e_4
Configuration
c.Application.log_level = 'DEBUG'
c.JupyterHub.cookie_secret_file = '/var/jupyterhub/jupyterhub_cookie_secret'
c.JupyterHub.data_files_path = '/opt/jupyterhub/miniconda/2022-04-10/share/jupyterhub'
c.JupyterHub.db_url = 'sqlite:////var/jupyterhub/jupyterhub.sqlite'
c.ConfigurableHTTPProxy.debug = True
c.JupyterHub.hub_connect_ip = 'hub:ipv6:address'
c.JupyterHub.hub_ip = 'hub:ipv6:address'
c.JupyterHub.ip = ''
c.JupyterHub.log_level = 'DEBUG'
c.ConfigurableHTTPProxy.pid_file = '/var/jupyterhub/jupyterhub-proxy.pid'
c.Spawner.debug = True
c = get_config()
c.JupyterHub.spawner_class = "wrapspawner.ProfilesSpawner"
c.Spawner.http_timeout = 6000
c.Spawner.start_timeout = 6000
import batchspawner
import batchspawner.api
c.PBSSpawner.batch_script = ''' #!/bin/sh
#PBS -l walltime={runtime}
#PBS -l select=1:ncpus={nprocs}:mem={memory}:ngpus={ngpus}:mpiprocs=1:ompthreads={nprocs}
#PBS -N jupyterhub
#PBS -v {keepvars}
#PBS -q {queue}
export RELEASE=2022-04-10
export PATH=/apps/jupyterhub/$RELEASE/bin:/bin:/usr/bin:/sbin:/usr/sbin:/opt/pbs/default/bin:/usr/local/bin
printenv
set -x
echo "{cmd}" > $HOME/.jupyterhub-lab.stdout 2>&1
cd $HOME
{cmd} --ip="::"
echo $?
'''
c.BatchSpawnerBase.req_nprocs = '4'
c.BatchSpawnerBase.req_runtime = '25:0:0'
c.BatchSpawnerBase.req_memory  = '4gb'
c.BatchSpawnerBase.req_ngpus   = '0'
c.BatchSpawnerBase.req_queue   = 'interactive'
c.BatchSpawnerBase.batch_submit_cmd = '/opt/jupyterhub/pbs/bin/__qsub'
c.BatchSpawnerBase.batch_cancel_cmd = '/opt/jupyterhub/pbs/bin/__qdel {job_id}'
c.BatchSpawnerBase.batch_query_cmd  = '/opt/jupyterhub/pbs/bin/__qstat {job_id}'
c.BatchSpawnerBase.state_exechost_re = 'exec_host = (.+)'
c.BatchSpawnerBase.state_pending_re = 'job_state = [QH]'
c.BatchSpawnerBase.state_running_re = 'job_state = R'
c.ProfilesSpawner.profiles = [
        ( "1 cores 8GB 8 hours", 'c1-1c-8g-8h', 'batchspawner.PBSSpawner',
                { "req_nprocs" : "1", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"8gb", "req_ngpus":"0" } ),
        ( "4 cores, 32GB, 8 hours", "c1-4c-16g-8h", "batchspawner.PBSSpawner",
                { "req_nprocs" : "4", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"32gb", "req_ngpus":"0" } ),
        ( "8 cores, 64GB, 8 hours", "c1-8c-64g-8h", "batchspawner.PBSSpawner",
                { "req_nprocs" : "8", "req_queue":"interactive", "req_runtime":"08:00:00", "req_memory":"64gb", "req_ngpus":"0" } ),
        ( "2 cores, 16GB, 8 hours, 1 GPU", "c1-2c-16g-8h-1gpu", "batchspawner.PBSSpawner",
                { "req_nprocs" : "2", "req_queue":"gpu", "req_runtime":"08:00:00", "req_memory":"16gb", "req_ngpus":"1" } ),
        ]
from jupyter_saml2authenticator import Saml2Authenticator
from traitlets import Bool
class CheckingSaml2Authenticator(Saml2Authenticator):
    def validate_username(self, username):
        if not super().validate_username(username):
            return False
        """Check if the user exists on the system"""
        import pwd
        try:
            pwd.getpwnam(username)
        except KeyError:
            return False
        else:
            return True
    # For some reason this is not configurable in the parent class
    delete_invalid_users = Bool(
        default_value = True,
        config = True,
        help="Whether to delete users (from JupyterHub DB) who no longer validate",
    )
c.JupyterHub.authenticator_class = CheckingSaml2Authenticator
c.CheckingSaml2Authenticator.saml2_metadata_url = 'REDACTED'
c.CheckingSaml2Authenticator.saml2_entity_id = 'REDACTED'
c.CheckingSaml2Authenticator.saml2_attribute_username = 'name'
c.CheckingSaml2Authenticator.delete_invalid_users = True
c.CheckingSaml2Authenticator.login_service = 'ID'
Logs

From the hub logs (I've redacted some items - hopefully not so much it's impossible to work out what is going on)

[I 2022-04-10 14:01:45.365 JupyterHub log:189] 302 GET /hub/api/oauth2/authorize?client_id=jupyterhub-user-username&redirect_uri=%2Fuser%2Fusername%2Foauth_callback&response_type=code&state=[secret] -> /user/username/oauth_callback?code=[secret]&state=[secret] (username@::ffff:user.ipv4.address) 54.29ms
14:01:45.938 [ConfigProxy] debug: PROXY WEB /user/username/oauth_callback?code=redacted to http://[compute:ipv6:address]:40713
[D 2022-04-10 14:01:45.961 JupyterHub scopes:301] Authenticated with token <APIToken('REDACTED...', user='username', client_id='jupyterhub')>
[D 2022-04-10 14:01:45.962 oauthlib.oauth2.rfc6749.endpoints.token token:112] Dispatching grant_type authorization_code request to <oauthlib.oauth2.rfc6749.grant_types.authorization_code.AuthorizationCodeGrant object at REDACTED>.
[D 2022-04-10 14:01:45.962 JupyterHub provider:53] authenticate_client <oauthlib.Request SANITIZED>
[D 2022-04-10 14:01:45.974 oauthlib.oauth2.rfc6749.grant_types.authorization_code authorization_code:534] Using provided redirect_uri /user/username/oauth_callback
[D 2022-04-10 14:01:45.975 JupyterHub provider:112] confirm_redirect_uri: client_id=jupyterhub-user-username, redirect_uri=/user/username/oauth_callback
[D 2022-04-10 14:01:45.975 oauthlib.oauth2.rfc6749.grant_types.authorization_code authorization_code:302] Token request validation ok for <oauthlib.Request SANITIZED>.
[D 2022-04-10 14:01:45.975 JupyterHub provider:335] Saving bearer token {'access_token': 'REDACTED', 'expires_in': 1209600, 'token_type': 'Bearer', 'scope': '', 'refresh_token': 'REDACTED'}
[D 2022-04-10 14:01:46.000 JupyterHub provider:195] Deleting oauth code 4Ef... for jupyterhub-user-username
[I 2022-04-10 14:01:46.020 JupyterHub log:189] 200 POST /hub/api/oauth2/token (username@compute:ipv6:address) 64.17ms
[D 2022-04-10 14:01:46.036 JupyterHub base:281] Recording first activity for <APIToken('OSFd...', user='username', client_id='jupyterhub-user-username')>
[D 2022-04-10 14:01:46.055 JupyterHub scopes:301] Authenticated with token <APIToken('OSFd...', user='username', client_id='jupyterhub-user-username')>
[I 2022-04-10 14:01:46.066 JupyterHub log:189] 200 GET /hub/api/user (username@compute:ipv6:address) 32.94ms
14:01:46.637 [ConfigProxy] debug: PROXY WEB /user/username/lab? to http://[compute:ipv6:address]:40713
14:01:46.850 [ConfigProxy] error: 503 GET /user/username/lab? socket hang up
[D 2022-04-10 14:01:46.857 JupyterHub pages:586] No template for 503
[I 2022-04-10 14:01:46.872 JupyterHub log:189] 200 GET /hub/error/503?url=%2Fuser%2Fusername%2Flab%3F (@hub:ipv6:address) 16.33ms

and from the compute node:

[D 2022-04-10 14:01:46.062 SingleUserLabApp auth:395] Received request from Hub user {'admin': True, 'groups': [], 'name': 'username', 'kind': 'user', 'session_id': 'redacted', 'scopes': ['access:servers!server=username/']}
[I 2022-04-10 14:01:46.062 SingleUserLabApp auth:1132] Logged-in user {'admin': True, 'groups': [], 'name': 'username', 'kind': 'user', 'session_id': 'redacted', 'scopes': ['access:servers!server=username/']}
[D 2022-04-10 14:01:46.062 SingleUserLabApp auth:841] Setting oauth cookie for ::ffff:user.ipv4.address: jupyterhub-user-username, {'path': '/user/username/', 'httponly': True, 'secure': True}
[I 2022-04-10 14:01:46.063 SingleUserLabApp log:189] 302 GET /user/username/oauth_callback?code=[secret]&state=[secret] -> /user/username/lab? (@::ffff:user.ipv4.address) 119.27ms
[E 2022-04-10 14:01:46.643 SingleUserLabApp http1connection:67] Uncaught exception
    Traceback (most recent call last):
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/http1connection.py", line 273, in _read_message
        delegate.finish()
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/httpserver.py", line 387, in finish
        self.delegate.finish()
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/routing.py", line 268, in finish
        self.delegate.finish()
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 2290, in finish
        self.execute()
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 2309, in execute
        self.handler = self.handler_class(
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 227, in __init__
        self.clear()
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/tornado/web.py", line 328, in clear
        self.set_default_headers()
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyter_server/base/handlers.py", line 314, in set_default_headers
        elif self.token_authenticated and "Access-Control-Allow-Origin" not in self.settings.get(
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyter_server/base/handlers.py", line 159, in token_authenticated
        return self.login_handler.is_token_authenticated(self)
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/singleuser/mixins.py", line 105, in is_token_authenticated
        handler.get_current_user()
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 1051, in get_current_user
        self._hub_auth_user_cache = self.check_hub_user(user_model)
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 978, in check_hub_user
        if self.allow_all:
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 924, in allow_all
        self.hub_scopes is None
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 916, in hub_scopes
        return self.hub_auth.oauth_scopes or None
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 577, in __get__
        return self.get(obj, cls)
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 540, in get
        default = obj.trait_defaults(self.name)
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 1580, in trait_defaults
        return self._get_trait_default_generator(names[0])(self)
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/traitlets/traitlets.py", line 977, in __call__
        return self.func(*args, **kwargs)
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/site-packages/jupyterhub/services/auth.py", line 357, in _default_scopes
        return set(json.loads(env_scopes))
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/__init__.py", line 346, in loads
        return _default_decoder.decode(s)
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/apps/jupyterhub/2022-04-10/lib/python3.9/json/decoder.py", line 355, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
    json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
@dithwick dithwick added the bug label Apr 10, 2022
@welcome
Copy link

welcome bot commented Apr 10, 2022

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant