Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additions to how it works, and a simple "keep alive" strategy #55

Open
consideRatio opened this issue Feb 2, 2023 · 7 comments
Open

Comments

@consideRatio
Copy link
Member

I learned a few things from @minrk on gitter (February 1, 2023 6:36 PM) that I want to ensure is captured in this repo.

My takeaway points from what Min said

  1. server will not shut itself down if it has a stuck-busy kernel, but [jupyterhub-idle-culler] will
  2. It’s in what the server considers activity to be reported. Messsged are activity (execution, output), but being statically busy is not considered activity.
  3. It’s in our power to change this - the hub singleuser activity tracking could be modified to consider (separately, even) busy and/or connected kernels to be “active”
  4. So if you changed one big sleep to a for/print every minute or two, it should stay up.

Gitter discussion

Me:

I'm at a loss. Shouldn't starting a notebook running in jupyterlab, like below, make it end up >reporting regularly that its active, which makes z2jh's culler not cull the instance?

import time
time.sleep(3600*24)

Apparently, it doesn't seem to help.

Min:

Is the lab tab still open and connected?

If not, then no. Culler doesn’t consider a notebook not being interacted with activity.

It’s in our power to change this - the hub singleuser activity tracking could be modified to consider (separately, even) busy and/or connected kernels to be “active”

Me:

Hmm @minrk no, when you say "Culler", which culler are you speaking about? The jupyterhub-idle-culler?

I thought that the jupyterhub-idle-culler got reports about kernel activity, and that was sufficient.

Are you saying that the jupyterhub-idle-culler is aware of the kernel activity, but disregards it?

Min:

Sorry, not clear. It’s in what the server considers activity to be reported. Messsged are activity (execution, output), but being statically busy is not considered activity.

So if you changed one big sleep to a for/print every minute or two, it should stay up.
What I mean is that we could in JuoyterHub (or Jupyter server) add checks for busy kernels to ensure they are considered active if busy.

We can have config for this to make it opt-in or out.

The server’s internal culler does have a switch for this - cull_busy_kernels, I think? It’s off by default, so there is a difference there - server will not shut itself down if it has a stuck-busy kernel, but Hub will.

The internal culler has lots more fine-grained info to make decisions with, so I think it should do a better job in general with shorter deadlines, and the hub culler ought to have quite a long one most of the time.

But I think I’ve also seen the internal one fail to shutdown when it should, where the hub cull is very reliable to shutdown when it thinks it should

@consideRatio
Copy link
Member Author

consideRatio commented Feb 2, 2023

New version of my script

# This script was based on the idea described by Min RK in
# https://github.com/jupyterhub/jupyterhub-idle-culler/issues/55
#
import time

print(
    "This script can ensure jupyterhub-idle-culler isn't going to "
    "stop this server due to inactivity by having an busy kernel "
    "that also generates some output regularly.\n"
)

hours = int(input("   Enter hours of activity: "))

print(f"\nGenerating kernel output every minute for a {hours} hours:\n")

# loop over all hours and print 60 dots for each
for hour in range(hours):
    print(f"Hour {hour}: ", end="")
    for min in range(60):
        time.sleep(60)
        print(".", end="")
    print()

print("Done generating kernel output!")

image

@consideRatio
Copy link
Member Author

consideRatio commented May 22, 2023

Further investigation

JupyterHub reports activity regularly, either via the jupyterhub-singleuser entrypoint script, or by using the jupyter_server extension.

They are just relying on the the primitive report of last_activity as reported by the user server, for example as reported by jupyter_server.

If we want jupyterhub-idle-culler to make better decisions, we need to have jupyterhub-idle-culler get better information, or to redefine what "last activity" means. Currently all it gets is "last activity" from the user server, which doesn't exclude that a user has a running kernel that doesn't emit any output or similar.

  • What is reported to jupyterhub is managed by jupyterhub's singleuser script/extension
  • jupyterhub's singleuser script/extension currently reads last_activity from the user server, but doesn't consider busy kernels or similar

@shaneknapp
Copy link

@consideRatio @minrk glad i found this issue, and i'm subscribing to this issue as we're about to deploy an undergraduate research hub here at berkeley.

one of the concerns we have is the idle culler killing "long running jobs", with "long running" currently defined as "longer than a few hours". :)

@DeepCowProductions
Copy link

DeepCowProductions commented Sep 2, 2024

+1 For more options, most importantly, the options for an end-user to have more control.

The python code above is what I suggest to our students if they ask, but it's less than ideal, as this could stop culling for an infinite amount of time.
Also, because of this activity record, additional Services or Apps that write to this record, also stop culling which is hard to control.
We have this Issue with the open source VS Code version code-server that we have installed in our images.

The best solution for us would be a ui-element within the JupyterLab where a User could specify a (limited) "keep alive time" that the culler would respect (read via some Lab REST endpoint perhaps). After the time is over, the User would have to log in again and renew its keep alive time or let other mechanism take over the control over his environment lifetime.

I could help with the backend implementation of this, but I'm not familiar with the JupyterLab frontend.

@minrk
Copy link
Member

minrk commented Sep 3, 2024

Right now, I think the best we have is jupyter-keepalive, which provides UI for keeping things alive for a specified amount of time. Applying a limit is tricky if it's going to be anything other than jupyterhub-idle-culler's max age, which unconditionally shuts down servers after some large time limit (e.g. 6 hours on mybinder.org), regardless of activity.

We could extend jupyter-keepalive to give itself a hard shutdown limit, too, so if the deadline is hit and (something, to be defined) doesn't happen before then, it shuts itself down instead of merely stopping artificial activity.

@DeepCowProductions
Copy link

jupyter-keepalive is kind of what we are looking for, but I am not a big fan of relying on the activity record in general. I think it only works okay when no other policies are necessary. To softly prevent infinite runtimes, more control would be great.
In my opinion, the dominant authority should be the culler, where we can decide whether we respect the user's request to keep their container alive, via some structured way (so not relying on activity record, which can be misleading).

One would have to create a reasonable api and configuration options for the two systems to work together.

@minrk
Copy link
Member

minrk commented Sep 4, 2024

You definitely don't need to rely on the activity record, if you don't want to. That's just what this "idle culler" does - cull idle servers. You can definitely collect whatever information is most relevant to you to decide when/how to shutdown servers, e.g. requesting a server lifetime at spawn time, with a prolongation API, etc.

If you can define the sources of information you'd like to use to make the shutdown decisions, we can help you figure out how to shutdown servers based on that information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants