Adding the option to delete persisted user data as well. #8

1kastner · 2020-06-15T15:15:52Z

Proposed change

At https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/ recently it was discussed that in some settings the persisted data of a user should also eventually be removed. This could be integrated into this service. I am really not sure whether it should because it is spawner-specific or even configuration-specific how to delete the persisted user data.

Alternative options

We could say that these concerns should be addressed separately and a second service could be created. The chances are pretty high that there would be more code duplication though to identify user accounts that exceed a certain age and haven't been used for a while.

Who would use this feature?

This is reasonable for settings with temporary users, e.g. mybinder or weekend seminars. You are sure you want to delete their data after some point.

manics · 2020-06-15T17:22:44Z

Together with #4 it almost sounds like we want a jupyter-admin cron utility. Perhaps with plugins for each function?

1kastner · 2020-06-16T09:59:28Z

For me that sounds like a reasonable approach. General plugins could be maintained in separate repositories to keep this clean. I hope from such a plugin I could still somehow reach the JupyterHub configuration so that the plugin can look up the details, such as which user "owns" which docker volume that should be deleted.

Regarding #4, in some cases it could make sense to link the plugin with the data source of the authenticator since that is a place e.g. email addresses are also stored. But again this might be very configration-specific. An alternative would be to have a separate plugin configuration file. From what I have seen until now I guess it violates the design principles of one centralized JupyterHub configuration though? Here I have a lack of experience with the JuypterHub design philosophy.

meeseeksmachine · 2020-06-16T13:36:00Z

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/10

1kastner · 2020-07-01T11:25:37Z

At https://jupyterhub.readthedocs.io/en/stable/reference/services.html I checked that a service can not access the loaded JupyterHub configuration. I see three options out there and I don't like any of them:

Re-load the JupyterHub configuration inside this service to get the information regarding user-specific persisted data
Maintain a separate cleaning scripts which leads to duplicated configuration (what is the user's prefix? ...)
Refactor JupyterHub so that they can access the configuration and remove the previously constructed isolation

Any ideas on this?

manics · 2020-07-01T11:38:02Z

The JupyterHub API lets you obtain state: object. Arbitrary internal state from this server's spawner. Only available on the hub's users list or get-user-by-name method, and only if a hub admin. None otherwise. for each of a user's servers:
https://jupyterhub.readthedocs.io/en/stable/_static/rest-api/index.html#/definitions/Server

There's also a discussion about making auth_state from the Authenticator accessible too:
jupyterhub/jupyterhub#1704

Do you think the combination of these would allow a service to request the necesary information?

1kastner · 2020-07-01T13:23:10Z

In my example configuration here the important part regarding the Docker volumes is listed as such:

notebook_dir = '/home/jovyan/'
c.DockerSpawner.notebook_dir = notebook_dir
c.DockerSpawner.volumes = { 'jupyterhub-user-{username}': notebook_dir }

The data stored in c.DockerSpawner.volumes would be sufficient for my purpose of deleting docker volumes now but I can't speak for k8s configurations. For that I would need a partner on that side.

1kastner · 2020-07-01T14:55:55Z

So let's see if I got you right here: You suggest the DockerSpawner tells via the server state which docker volume belongs to it (in my case jupyterhub-user-{username}) and this (already serliaized) information is added to the auth_state but is only visible if you are admin (which the JupyterHub service is).

Once the JupyterHub service has the information jupyterhub-user-user0, it can also run docker volume rm jupyterhub-user-user0 and everything is fine and solved! So yes, that is a viable solution.

manics · 2020-07-01T15:14:06Z

Something like that! But this is outside my knowledge of JupyterHub, @minrk will know better.

rkdarst · 2020-07-01T15:18:43Z

I do something similar to this (if I understand correctly) to set unique cull times: https://github.com/AaltoSciComp/jupyterhub-aalto/blob/8bb8c3f0d538641141c5024c272245f943747fd2/scripts/cull_idle_servers.py#L180 https://github.com/AaltoSciComp/jupyterhub-aalto/blob/8bb8c3f0d538641141c5024c272245f943747fd2/jupyterhub_config.py#L772 ... requires a bit of care but in principle not too hard, and I think the idea works quite well.

1kastner · 2020-07-01T18:18:57Z

@rkdarst thank you so much for sharing that!

meeseeksmachine · 2020-07-01T19:40:20Z

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/13

1kastner · 2020-07-02T17:29:18Z

Actually the server state is alreay published, see this code - thanks to all participants of this conversation to arrive there! Since the generic spawner api does not prescribe any spawner-specific content and the docker spawner does only add a little information, this only needs some additional information regarding the name of the docker volumes as presented before (see top).

Due to this input and also due to the discussion in the forum, I would suggest that I first create my personal variation of the dockerspawner that shares the information I need (i.e. the docker volume name) and second create a copy of this service using a different name. That service would only remove docker volumes if they are expired for a long time (we might want different times for culling idle Jupyter Notebooks and deleting whatever data the user believed we would persist for them).

For the long term, a plugin architecture as mentioned by manics sounds great to me too!

rkdarst · 2020-07-15T18:52:51Z

A plugin system for this somehow seems like a lot of work, since to me JupyterHub can already take plugins. But the difficulty of making a new service is too much, there is a lot of boilerplate. Imagine if there was...

A JupyterHub service library, that had the core event loop and periodic polling
Could be subclassed
Override methods to determine what happens on each poll. A method gets user and server (which includes auth_state, state, etc).
Then, a new service can be a single file that imports this library and does what it needs to. The service is started like any other service - python my_service.py - without needing to figure out an extra layer of a separate cron utility ("plugin" makes me think entrypoints, which is a whole other layer to think about, and then does it have to be separately installed?).

Of course I am biased since I have a working system... and don't have time implement what I am suggesting. But, I will work to use and debug whatever is implemented, if it doesn't add too many extra layers.

1kastner · 2020-07-16T08:12:01Z

That sounds fine to me as long as I get the mentioned work done!

manics · 2020-07-16T08:23:16Z

I think a service library also works, and even if we moved to a plugin architecture I expect the plugins would want this library anyway. What does everyone think about developing the library in this repo, then perhaps moving it to its own repo or JupyterHub core after it's had some production use?

1kastner · 2020-07-16T09:18:43Z

From your quick explanation I have many detail-related conceptual questions, e.g. some visualizations might help etc.

Regarding your development plan I am not sure. Why do you think it should move back to the core? I did not mean to split the community - I would rather like to see that the code from this repository evolves step-by-step including backwards-compability. Do you think that is too difficult either in technical or project-administrative terms?

manics · 2020-07-16T10:00:59Z

Ignore me! I misunderstood "A JupyterHub service library, that had the core event loop and periodic polling" as wanting a library in core JupyterHub. I'm completely happy for it to remain separate 🙂

manics · 2020-07-16T10:04:16Z

Also would it be OK to keep the design discussions on one issue? Either this or #9, we can rename the issue title as necessary to make it clearer.

1kastner · 2020-07-16T13:01:44Z

I am fine with that at #9 we can discuss some general design issues and here we discuss how this can be used for the purpose I have mentioned in the beginning of this issue, likewise #4 can pick up the results from #9 when implementing it. Therefore, i guess this issue might get less attention for a while until a common conceptual approach is found..

AtulSinghBankoti · 2020-10-20T07:27:34Z

Hi,

Is their any fix update for this issue?

We are also facing similar issue discussed here.
We have jupyterhub deployed in k8s cluster & we are using EFS(Elastic File System) as PV(Persistent Volume).
When we delete a jupyterhub user using admin panel. User is deleted but the PVC(Persistent Volume Claims) associated with that
user are not getting deleted. If we create a new user with same name, the PVC are getting attached to new user.

meeseeksmachine · 2020-10-20T08:54:41Z

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/16

ruchirkakkad · 2022-02-10T11:13:14Z

Hi,

I do not want to remove admin user in Jupyterhub by cull. I have implemented Jupyterhub using helm chart.

1kastner added the enhancement New feature or request label Jun 15, 2020

1kastner mentioned this issue Jul 1, 2020

Allow services to retrieve auth_state jupyterhub/jupyterhub#1704

Closed

1kastner mentioned this issue Jul 2, 2020

Share docker volume name in server state jupyterhub/dockerspawner#384

Open

manics mentioned this issue Jul 15, 2020

How configurable should the culler be? #9

Open

manics mentioned this issue Oct 6, 2020

Deleting named servers should delete the corresponding PVC jupyterhub/kubespawner#446

Closed

consideRatio mentioned this issue Apr 17, 2021

Support custom culling logic through hooks (like Spawner's pre_spawn_hook etc) #25

Open

manics mentioned this issue Dec 14, 2021

Integrate jupyterhub-idle-culler with KubeIngressProxy #42

Open

nated0g mentioned this issue Mar 8, 2022

Auto delete PersistentVolume after a length of time? nodeschoolyvr/nodeschoolhelm#2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the option to delete persisted user data as well. #8

Adding the option to delete persisted user data as well. #8

1kastner commented Jun 15, 2020

manics commented Jun 15, 2020

1kastner commented Jun 16, 2020

meeseeksmachine commented Jun 16, 2020

1kastner commented Jul 1, 2020

manics commented Jul 1, 2020

1kastner commented Jul 1, 2020

1kastner commented Jul 1, 2020

manics commented Jul 1, 2020

rkdarst commented Jul 1, 2020 via email

1kastner commented Jul 1, 2020

meeseeksmachine commented Jul 1, 2020

1kastner commented Jul 2, 2020

rkdarst commented Jul 15, 2020

1kastner commented Jul 16, 2020

manics commented Jul 16, 2020 •

edited

Loading

1kastner commented Jul 16, 2020

manics commented Jul 16, 2020

manics commented Jul 16, 2020

1kastner commented Jul 16, 2020

AtulSinghBankoti commented Oct 20, 2020

meeseeksmachine commented Oct 20, 2020

ruchirkakkad commented Feb 10, 2022

Adding the option to delete persisted user data as well. #8

Adding the option to delete persisted user data as well. #8

Comments

1kastner commented Jun 15, 2020

Proposed change

Alternative options

Who would use this feature?

manics commented Jun 15, 2020

1kastner commented Jun 16, 2020

meeseeksmachine commented Jun 16, 2020

1kastner commented Jul 1, 2020

manics commented Jul 1, 2020

1kastner commented Jul 1, 2020

1kastner commented Jul 1, 2020

manics commented Jul 1, 2020

rkdarst commented Jul 1, 2020 via email

1kastner commented Jul 1, 2020

meeseeksmachine commented Jul 1, 2020

1kastner commented Jul 2, 2020

rkdarst commented Jul 15, 2020

1kastner commented Jul 16, 2020

manics commented Jul 16, 2020 • edited Loading

1kastner commented Jul 16, 2020

manics commented Jul 16, 2020

manics commented Jul 16, 2020

1kastner commented Jul 16, 2020

AtulSinghBankoti commented Oct 20, 2020

meeseeksmachine commented Oct 20, 2020

ruchirkakkad commented Feb 10, 2022

manics commented Jul 16, 2020 •

edited

Loading