-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JupyterHub CrashLoopBackOff #493
Comments
Heya @kdubovikov! Thanks for filing this issue! It looks like the hub pod can not reach the proxy pod. Is Pod networking and kube-proxy working properly? I suspect this is an OpenStack installation / bare-metal setup. Are other services on the cluster working fine? Does https://scanner.heptio.com/ find any issues? |
Hey @yuvipanda , thanks for the response. All other services are working fine (we also run glusterfs). I've ran the tests and no issues have been found:
Also, I am able to run Jupyter Hub with |
Hmm, in that case I'm at a loss about what is going on :( |
Ping @minrk. Any thoughts? Are you still seeing this issue @kdubovikov? |
It does seem like a networking problem, but I'm not sure what the best way to debug it would be. You could edit the Hub command to run a You could also try communicating with the proxy from another context (e.g. outside the cluster, another pod, etc.) to be sure that the proxy pod is accepting connections. Do you have any NetworkPolicy config on the cluster? |
@minrk, I think no NetworkPolicy is present. The cluster was set up using |
You can edit the jupyterhub command with:
and change the command that looks like: - command:
- jupyterhub
- --config
- /srv/jupyterhub_config.py
- --upgrade-db to - command:
- sh
- -c
- while true; do sleep 10; done This will create a new hub pod with the new command, which you can |
did you got this solution? i alse have this problem.who can help me? thanks for all of you. Name: hub-86d676cf88-jw8ws 32m 32m 1 {default-scheduler } Normal Scheduled Successfully assigned hub-86d676cf88-jw8ws to 192.168.0.5 |
I'm seeing this on GKE. We were running v0.6 and tried to upgrade to the latest chart. After some helm failures I reverted to v0.6 but ran into this. I've tried deleting the pods and deployments. I'll do some debugging. |
There's no curl or wget in the pod. With python3+requests I can confirm the tornado.curl_httpclient.CurlError error that the proxy-api endpoint times out. proxy-public and proxy-http are responsive. The cluster has:
|
The proxy-api object was referencing a newer version of the helm chart -- one that I had previously tried to upgrade to. I deleted the proxy-api object, then reran my CI to do a helm upgrade and now everything is working. |
I'm ran into this again on Azure after a helm upgrade. Unlike last time, I couldn't access any of the service endpoints. I have a feeling this last occasion is due to the infrastructure and not z2jh, but I just thought I'd leave a trail marker. |
Hmmm, @ryanlovett wrote:
Does this mean that our proxy pod did not trigger restart as it should, or that it persisted some faulty state that needed to be refreshed? Ideas on what state that were outdated? @ryanlovett we have now released 0.7.0, any feedback on your update to that would be very relevant. If you do, just make sure to follow upgrade instructions in the changelog.md file. |
any thoughs on this? |
I found that these errors happen when the hub and proxy gets an update at the same time. The hub is going to crash if it fails to communicate with the proxy, but realizing the failure happens 20 seconds later and by this time, the hub can be apparently functional. When we bump the JupyterHub version the next time, we will get to use jupyterhub/jupyterhub#2750, it will make the hub pod look stay unavailable until it actually will function reliable. Perhaps we bump it along with #1422, or earlier. |
I am trying to spin up JupyterHub using helm and all resources start successfully, but after a short time hub pod enters CrashLoopBackOff.
Installation was performed using the following command:
I've also tested version 0.5 and got the same results.
Logs:
Namespace status:
Contents of
config.yaml
The text was updated successfully, but these errors were encountered: