-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding the option to delete persisted user data as well. #8
Comments
Together with #4 it almost sounds like we want a jupyter-admin cron utility. Perhaps with plugins for each function? |
For me that sounds like a reasonable approach. General plugins could be maintained in separate repositories to keep this clean. I hope from such a plugin I could still somehow reach the JupyterHub configuration so that the plugin can look up the details, such as which user "owns" which docker volume that should be deleted. Regarding #4, in some cases it could make sense to link the plugin with the data source of the authenticator since that is a place e.g. email addresses are also stored. But again this might be very configration-specific. An alternative would be to have a separate plugin configuration file. From what I have seen until now I guess it violates the design principles of one centralized JupyterHub configuration though? Here I have a lack of experience with the JuypterHub design philosophy. |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/10 |
At https://jupyterhub.readthedocs.io/en/stable/reference/services.html I checked that a service can not access the loaded JupyterHub configuration. I see three options out there and I don't like any of them:
Any ideas on this? |
The JupyterHub API lets you obtain There's also a discussion about making Do you think the combination of these would allow a service to request the necesary information? |
In my example configuration here the important part regarding the Docker volumes is listed as such:
The data stored in |
So let's see if I got you right here: You suggest the DockerSpawner tells via the server state which docker volume belongs to it (in my case Once the JupyterHub service has the information |
Something like that! But this is outside my knowledge of JupyterHub, @minrk will know better. |
I do something similar to this (if I understand correctly) to set
unique cull times:
https://github.com/AaltoSciComp/jupyterhub-aalto/blob/8bb8c3f0d538641141c5024c272245f943747fd2/scripts/cull_idle_servers.py#L180
https://github.com/AaltoSciComp/jupyterhub-aalto/blob/8bb8c3f0d538641141c5024c272245f943747fd2/jupyterhub_config.py#L772
... requires a bit of care but in principle not too hard, and I think
the idea works quite well.
|
@rkdarst thank you so much for sharing that! |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/13 |
Actually the server state is alreay published, see this code - thanks to all participants of this conversation to arrive there! Since the generic spawner api does not prescribe any spawner-specific content and the docker spawner does only add a little information, this only needs some additional information regarding the name of the docker volumes as presented before (see top). Due to this input and also due to the discussion in the forum, I would suggest that I first create my personal variation of the dockerspawner that shares the information I need (i.e. the docker volume name) and second create a copy of this service using a different name. That service would only remove docker volumes if they are expired for a long time (we might want different times for culling idle Jupyter Notebooks and deleting whatever data the user believed we would persist for them). For the long term, a plugin architecture as mentioned by manics sounds great to me too! |
A plugin system for this somehow seems like a lot of work, since to me JupyterHub can already take plugins. But the difficulty of making a new service is too much, there is a lot of boilerplate. Imagine if there was...
Of course I am biased since I have a working system... and don't have time implement what I am suggesting. But, I will work to use and debug whatever is implemented, if it doesn't add too many extra layers. |
That sounds fine to me as long as I get the mentioned work done! |
I think a service library also works, and even if we moved to a plugin architecture I expect the plugins would want this library anyway. What does everyone think about developing the library in this repo, then perhaps moving it to its own repo or JupyterHub core after it's had some production use? |
From your quick explanation I have many detail-related conceptual questions, e.g. some visualizations might help etc. Regarding your development plan I am not sure. Why do you think it should move back to the core? I did not mean to split the community - I would rather like to see that the code from this repository evolves step-by-step including backwards-compability. Do you think that is too difficult either in technical or project-administrative terms? |
Ignore me! I misunderstood "A JupyterHub service library, that had the core event loop and periodic polling" as wanting a library in core JupyterHub. I'm completely happy for it to remain separate 🙂 |
Also would it be OK to keep the design discussions on one issue? Either this or #9, we can rename the issue title as necessary to make it clearer. |
I am fine with that at #9 we can discuss some general design issues and here we discuss how this can be used for the purpose I have mentioned in the beginning of this issue, likewise #4 can pick up the results from #9 when implementing it. Therefore, i guess this issue might get less attention for a while until a common conceptual approach is found.. |
Hi, Is their any fix update for this issue? We are also facing similar issue discussed here. |
This issue has been mentioned on Jupyter Community Forum. There might be relevant details there: https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/16 |
Hi, I do not want to remove admin user in Jupyterhub by cull. I have implemented Jupyterhub using helm chart. |
Proposed change
At https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/ recently it was discussed that in some settings the persisted data of a user should also eventually be removed. This could be integrated into this service. I am really not sure whether it should because it is spawner-specific or even configuration-specific how to delete the persisted user data.
Alternative options
We could say that these concerns should be addressed separately and a second service could be created. The chances are pretty high that there would be more code duplication though to identify user accounts that exceed a certain age and haven't been used for a while.
Who would use this feature?
This is reasonable for settings with temporary users, e.g. mybinder or weekend seminars. You are sure you want to delete their data after some point.
The text was updated successfully, but these errors were encountered: