Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How configurable should the culler be? #9

Open
rkdarst opened this issue Jul 15, 2020 · 12 comments
Open

How configurable should the culler be? #9

rkdarst opened this issue Jul 15, 2020 · 12 comments
Labels
enhancement New feature or request

Comments

@rkdarst
Copy link
Contributor

rkdarst commented Jul 15, 2020

Now that the culler is installable, a question is: how should the culler be customized? Previously, it was via making a copy and modifying it. Now, there is one blessed, easy to install version - what happens to others?

In one way, I am very happy it's a separate repo now, so that it's straightforward (possibly not easy, though) to maintain a long-running fork. On the other hand, it means now I need to actually think about what's the best way, and coordinate...

Alternative options

Based on really quick thinking:

  1. Do nothing, blessed version is installable, other versions installed manually via pip install https://... from own github fork in special branch.
  2. Encourage different custom situations to be submitted upstream. But, then we need to decide what is worth accepting, and then we get into a huge mess where we decide an API without much planning (e.g. in my case, I use a culltime element in the spawner state to decide when it can be culled.)
  3. Encourage/support people to still copy/(rename) the script file when needed. Naming the script init.py doesn't encourage this much, but this does have a benefit that it's clear that it's a different thing. I can imagine that (1) will get confusing when the same module intentionally has different functions on different people's systems, but we can just accept that.

Who would use this feature?

I have two hubs and they both have customized cullers. I've seen enough other issues about the culler that I imagine that others customize it, too.

(Optional): Suggest a solution

  • At least document how we suggest local customizations
  • Think about what kind of customizations we would accept. e.g. new command line options=OK, things that dig into spawner state to make culling choices=not maintainable.
  • I would feel happy renaming init.py to back to cull_idle_servers.py and importing the minimum in init, but I don't think that can be justified (to make cases where this is copied manually elsewhere more obvious).

I hope to make a PR soon that will demonstrate something we probably don't take.

@rkdarst rkdarst added the enhancement New feature or request label Jul 15, 2020
@welcome
Copy link

welcome bot commented Jul 15, 2020

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.
welcome
You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

@manics
Copy link
Member

manics commented Jul 15, 2020

This sounds similar to the discussions in #8 and #4 which were leaning towards some sort of plugin system, where jupyterhub-idle-culler almost becomes a Jupyter cronjob manager, though more structured than juts running scripts.

@rkdarst
Copy link
Contributor Author

rkdarst commented Jul 15, 2020

To me, a plugin system seemed like a bit overkill - I see JupyterHub as the "cron manager" that runs the scripts. But it is a good point that there is a lot of duplication to do a little bit more. I'll comment there, but I think the question of what to do in the core would still be relevant. but we'll see...

@1kastner
Copy link
Contributor

In #8 we first discussed the option of a plugin system but I am definitely open to other versions as well. I prefer a system where it is easy to find reference implementations. Hence I support alternative 2 to have some versions upstream or maybe another alternative and in addition a collection of the most noteworthy forks somewhere, maybe in the readme?

My reasoning is this: Whoever is new to Jupyter in general and JupyterHub in specific might feel overwhelmed when they need to gather all information by browsing random github repositories (I mean probably interesting forks). I prefer that the jupyterhub-idle-culler project (in some abstract sense) offers some guidance for those new users. This reduces the re-invention of the wheel over and over again. I understand that some scripts might be very instance-specific and at the same time I like to share my results with fellow Jovians in an efficient way and adapt solutions from others as easily as possible.

Openly speaking I do not have deep knowledge about with which programming patterns this aim is achieved in the most time-efficient way. For me people (i.e. communication and sharing existing solution) come before the beauty of code. To avoid confusion about what the culler service does, maybe logging can help? Still we need to be careful though.

@1kastner
Copy link
Contributor

I am curious about your opinions as well, @manics and @rkdarst !

@manics
Copy link
Member

manics commented Aug 28, 2020

I think we can separate the discussion around avoiding user confusion from the decision on implementation.

For example, if we go down the route of separate plugins/extensions in their own repositories (and separate pypi packages) then jupyterhub-idle-culler or some other meta-package could bundle together all the recommended ones using install_requires.

Other factors to consider when deciding whether to have multiple implementations in this repo vs separate ones is ease of testing and maintenance. Regardless of which option we choose as @rkdarst mentioned we still need to decide on an API.

@1kastner
Copy link
Contributor

That seems to be a nice idea! In another context @yuvipanda suggested a pluggy-based plugin system. Whenever we need a plugin hook in the lengthy cull_idle function (it spans from row 102 to 308) we could insert a pluggy hook which by default is a function without any effect. The places of where to insert them we could decide based on the current two ideas #4 and #8. This would leave us with the question of which parameters will be handed it since the API should be more or less stable over time and that restricts the freedom of development inside this package. But I can't predict the future so I guess if the API is a bit iterative in the beginning that would be totally fine for me. I guess we need to learn from the use-cases what is really needed.

@rcthomas
Copy link

I was just taking a look here at how I could slot in custom cull logic and noticed that unlike a bunch of other JupyterHub stuff this isn't using traitlets. Any reason we don't just convert to that and make the cull logic a configurable Callable?

@1kastner
Copy link
Contributor

@rcthomas none that I am aware of.

@manics
Copy link
Member

manics commented Sep 29, 2020

I don't know how much work will be involved in converting it to use traitlets and a configuration file, but it sounds like it should be easier than implementing a plugin system.

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/a-cull-idle-user-service-that-deletes-pvs/4742/16

@consideRatio
Copy link
Member

I opened #25 to discuss adding some general purpose hooks. JupyterHub's Spawner and Authenticator class have various hooks for example that can be customized, and I think those serve a very useful hatch for users without making maintenance of the project problematic.

  • If someone wants to email someone as part of culling the user i think a hook is the right way to help them accomplish it.
  • If someone wants to cull users based on custom logic, then i think a hook returning true/false given input of server activity, username, etc would make sense.

I opened the issue as I felt #25 was as far as I think we should go with configuration flags etc, and that the rest should be left for custom logic implemented in some way from the end user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants