Additions to how it works, and a simple "keep alive" strategy #55

consideRatio · 2023-02-02T09:56:18Z

I learned a few things from @minrk on gitter (February 1, 2023 6:36 PM) that I want to ensure is captured in this repo.

My takeaway points from what Min said

server will not shut itself down if it has a stuck-busy kernel, but [jupyterhub-idle-culler] will
It’s in what the server considers activity to be reported. Messsged are activity (execution, output), but being statically busy is not considered activity.
It’s in our power to change this - the hub singleuser activity tracking could be modified to consider (separately, even) busy and/or connected kernels to be “active”
So if you changed one big sleep to a for/print every minute or two, it should stay up.

Gitter discussion

Me:

I'm at a loss. Shouldn't starting a notebook running in jupyterlab, like below, make it end up >reporting regularly that its active, which makes z2jh's culler not cull the instance?
import time
time.sleep(3600*24)
Apparently, it doesn't seem to help.

Min:

Is the lab tab still open and connected?

If not, then no. Culler doesn’t consider a notebook not being interacted with activity.

It’s in our power to change this - the hub singleuser activity tracking could be modified to consider (separately, even) busy and/or connected kernels to be “active”

Me:

Hmm @minrk no, when you say "Culler", which culler are you speaking about? The jupyterhub-idle-culler?

I thought that the jupyterhub-idle-culler got reports about kernel activity, and that was sufficient.

Are you saying that the jupyterhub-idle-culler is aware of the kernel activity, but disregards it?

Min:

Sorry, not clear. It’s in what the server considers activity to be reported. Messsged are activity (execution, output), but being statically busy is not considered activity.

So if you changed one big sleep to a for/print every minute or two, it should stay up.
What I mean is that we could in JuoyterHub (or Jupyter server) add checks for busy kernels to ensure they are considered active if busy.

We can have config for this to make it opt-in or out.

The server’s internal culler does have a switch for this - cull_busy_kernels, I think? It’s off by default, so there is a difference there - server will not shut itself down if it has a stuck-busy kernel, but Hub will.

The internal culler has lots more fine-grained info to make decisions with, so I think it should do a better job in general with shorter deadlines, and the hub culler ought to have quite a long one most of the time.

But I think I’ve also seen the internal one fail to shutdown when it should, where the hub cull is very reliable to shutdown when it thinks it should

consideRatio · 2023-02-02T10:27:51Z

New version of my script

# This script was based on the idea described by Min RK in
# https://github.com/jupyterhub/jupyterhub-idle-culler/issues/55
#
import time

print(
    "This script can ensure jupyterhub-idle-culler isn't going to "
    "stop this server due to inactivity by having an busy kernel "
    "that also generates some output regularly.\n"
)

hours = int(input("   Enter hours of activity: "))

print(f"\nGenerating kernel output every minute for a {hours} hours:\n")

# loop over all hours and print 60 dots for each
for hour in range(hours):
    print(f"Hour {hour}: ", end="")
    for min in range(60):
        time.sleep(60)
        print(".", end="")
    print()

print("Done generating kernel output!")

consideRatio · 2023-05-22T16:40:30Z

Further investigation

JupyterHub reports activity regularly, either via the jupyterhub-singleuser entrypoint script, or by using the jupyter_server extension.

They are just relying on the the primitive report of last_activity as reported by the user server, for example as reported by jupyter_server.

If we want jupyterhub-idle-culler to make better decisions, we need to have jupyterhub-idle-culler get better information, or to redefine what "last activity" means. Currently all it gets is "last activity" from the user server, which doesn't exclude that a user has a running kernel that doesn't emit any output or similar.

What is reported to jupyterhub is managed by jupyterhub's singleuser script/extension
jupyterhub's singleuser script/extension currently reads last_activity from the user server, but doesn't consider busy kernels or similar

shaneknapp · 2024-08-23T18:37:14Z

@consideRatio @minrk glad i found this issue, and i'm subscribing to this issue as we're about to deploy an undergraduate research hub here at berkeley.

one of the concerns we have is the idle culler killing "long running jobs", with "long running" currently defined as "longer than a few hours". :)

DeepCowProductions · 2024-09-02T14:09:17Z

+1 For more options, most importantly, the options for an end-user to have more control.

The python code above is what I suggest to our students if they ask, but it's less than ideal, as this could stop culling for an infinite amount of time.
Also, because of this activity record, additional Services or Apps that write to this record, also stop culling which is hard to control.
We have this Issue with the open source VS Code version code-server that we have installed in our images.

The best solution for us would be a ui-element within the JupyterLab where a User could specify a (limited) "keep alive time" that the culler would respect (read via some Lab REST endpoint perhaps). After the time is over, the User would have to log in again and renew its keep alive time or let other mechanism take over the control over his environment lifetime.

I could help with the backend implementation of this, but I'm not familiar with the JupyterLab frontend.

minrk · 2024-09-03T07:12:26Z

Right now, I think the best we have is jupyter-keepalive, which provides UI for keeping things alive for a specified amount of time. Applying a limit is tricky if it's going to be anything other than jupyterhub-idle-culler's max age, which unconditionally shuts down servers after some large time limit (e.g. 6 hours on mybinder.org), regardless of activity.

We could extend jupyter-keepalive to give itself a hard shutdown limit, too, so if the deadline is hit and (something, to be defined) doesn't happen before then, it shuts itself down instead of merely stopping artificial activity.

DeepCowProductions · 2024-09-03T08:12:46Z

jupyter-keepalive is kind of what we are looking for, but I am not a big fan of relying on the activity record in general. I think it only works okay when no other policies are necessary. To softly prevent infinite runtimes, more control would be great.
In my opinion, the dominant authority should be the culler, where we can decide whether we respect the user's request to keep their container alive, via some structured way (so not relying on activity record, which can be misleading).

One would have to create a reasonable api and configuration options for the two systems to work together.

minrk · 2024-09-04T10:53:02Z

You definitely don't need to rely on the activity record, if you don't want to. That's just what this "idle culler" does - cull idle servers. You can definitely collect whatever information is most relevant to you to decide when/how to shutdown servers, e.g. requesting a server lifetime at spawn time, with a prolongation API, etc.

If you can define the sources of information you'd like to use to make the shutdown decisions, we can help you figure out how to shutdown servers based on that information.

This was referenced Feb 19, 2023

Refresh README.md #57

Merged

Servers with activity are getting culled #43

Open

Add pre-commit config and configure autoformating tools and pytest from pyproject.toml jupyterhub/traefik-proxy#157

Merged

consideRatio mentioned this issue May 17, 2023

Advice needed for long running jobs on hubs 2i2c-org/docs#184

Closed

This was referenced May 23, 2023

Refine docs in this repo and upstream about server and kernel culling 2i2c-org/docs#185

Open

Kernel culling configured by defualt - do we make that opt-in? 2i2c-org/infrastructure#2586

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additions to how it works, and a simple "keep alive" strategy #55

Additions to how it works, and a simple "keep alive" strategy #55

consideRatio commented Feb 2, 2023

consideRatio commented Feb 2, 2023 •

edited

Loading

consideRatio commented May 22, 2023 •

edited

Loading

shaneknapp commented Aug 23, 2024

DeepCowProductions commented Sep 2, 2024 •

edited

Loading

minrk commented Sep 3, 2024

DeepCowProductions commented Sep 3, 2024

minrk commented Sep 4, 2024

Additions to how it works, and a simple "keep alive" strategy #55

Additions to how it works, and a simple "keep alive" strategy #55

Comments

consideRatio commented Feb 2, 2023

My takeaway points from what Min said

Gitter discussion

Me:

Min:

Me:

Min:

consideRatio commented Feb 2, 2023 • edited Loading

New version of my script

consideRatio commented May 22, 2023 • edited Loading

Further investigation

shaneknapp commented Aug 23, 2024

DeepCowProductions commented Sep 2, 2024 • edited Loading

minrk commented Sep 3, 2024

DeepCowProductions commented Sep 3, 2024

minrk commented Sep 4, 2024

consideRatio commented Feb 2, 2023 •

edited

Loading

consideRatio commented May 22, 2023 •

edited

Loading

DeepCowProductions commented Sep 2, 2024 •

edited

Loading