Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GESIS Collaboration - Dynamic image building in a JupyterHub #1382

Open
3 of 7 tasks
choldgraf opened this issue Jun 2, 2022 · 46 comments
Open
3 of 7 tasks

GESIS Collaboration - Dynamic image building in a JupyterHub #1382

choldgraf opened this issue Jun 2, 2022 · 46 comments
Assignees

Comments

@choldgraf
Copy link
Member

choldgraf commented Jun 2, 2022

Context

We recently started a collaboration with GESIS with the goal of generalizing, improving, and sustaining the "persistent binderhub" deployment at notebooks.gesis.org.

Here are the major things we'd like to do as part of this collaboration:

  • Generalize the "Persistent BinderHub" setup so that it is no longer unique to GESIS infrastructure
  • Improve aspects of the design and implementation to be more sustainable, scalable, usable, etc
  • Upstream as much as possible to parts of the JupyterHub / Binder ecosystem

Here's a link to the deliverables on the collaboration

Project roles

References

We'll use this issue to track progress on the collaboration, and to list issues where we have discussion and iteration on more focused parts of the collaboration. Below is a rough list of things to do.

Updates

Preview Give feedback
  1. GeorgianaElena consideRatio
  2. consideRatio

Cloud cost reminders

Preview Give feedback

Dedicated board

Ref: https://github.com/orgs/2i2c-org/projects/33.

@choldgraf
Copy link
Member Author

ping to @bitnik @arnim and @MridulS in case any of them are interested in following along, collaborating, or joining in discussions and meetings.

@arnim
Copy link

arnim commented Jun 2, 2022

Thank you @choldgraf ;)

@sgibson91
Copy link
Member

sgibson91 commented Jun 15, 2022

Just sharing some slides I presented to a Turing-based project around what I perceive the use cases/target audiences are for JupyterHub and BinderHub and how this dynamic image building in JupyterHub work might affect BinderHub as a separate project. This was presented in order for them to understand how work on internal infrastructure would affect their project and help them make decisions, and is only a reflection of my opinion after having conversations with different folks.

@arnim
Copy link

arnim commented Jun 15, 2022

@choldgraf should we do some kick-off?

@choldgraf
Copy link
Member Author

Two quick updates:

Meeting / timing

@arnim yes I believe it's a good idea - we have been working to finish up deploying Pangeo's BinderHub before working on this, and I think we figured out a short-term path forward there last week so hopefully can then shift focus to this project. We will ping here once it's time.

Kernels as a service

In a recent conversation with @jlperla we noted that this work might be related to another feature that many have requested in the Jupyter ecosystem, which is something like "Binder / Jupyter kernels as a service for scalable computation". The idea is that you could define an environment via a Binder-like repository, build it into an image with repo2docker, and then scale computation using that repository's environment via some cloud mechanism. This wouldn't be as rapidly scalable as something like Dask Gateway, but could be a useful way to rapidly / interactively parallelize something in the cloud. Just noting that here in case we find a way to connect it with this work.

@arnim
Copy link

arnim commented Jun 18, 2022

The "kernels as a service for scalable computation" idea sounds interesting ;) All the best for the deployment of Pangeo's Hub. A while ago @rabernat raised also an extremely interesting question. Maybe we can indeed move one step closer to a generic backplane for all kinds of scientific infrastructure. Computational replicability is key in economics, geoscience, and the social sciences ;)

@yuvipanda
Copy link
Member

Here's a document I wrote almost a year ago now that might help https://hackmd.io/OXhKs4xyQra0KgBglegSBQ

@choldgraf
Copy link
Member Author

@sgibson91 and I discussed this one a little bit today as well, and we agree that it might be helpful to spend some of her time acting as community strategic lead w/ JupyterHub to help steward discussions and feedback amongst stakeholders for this work (with technical vision / ideas / etc coming from others in the JupyterHub ecosystem).

This might be a way to help make progress on this issue and also experiment with ways that the Community Strategic Lead can start improving team processes for discussion and feedback-gathering.

@choldgraf
Copy link
Member Author

Update: Damian serving in a PM role

I spoke with @damianavila today and we agreed that this project will make more rapid progress if we can assign somebody to serve in a project manager style role to help us keep track of plans, conversations, timelines, and deliverables.

@damianavila said he would be willing to serve in this role and he will reduce his time on the PyData theme in order to grow the capacity to work here.

So our next steps are to:

  • Assign @damianavila in a project manager role on this issue
  • Create a place to track and plan this effort (maybe this issue + a project board for deliverables)
  • Plan initial meetings to align / plan
  • Plan our strategy around Engineering capacity (a mix of growing capacity in the 2i2c team, or bringing in capacity via sub-contracts with others

@arnim
Copy link

arnim commented Jun 29, 2022

This is great news. Thank you @damianavila for helping with this 🎉

@damianavila
Copy link
Contributor

@arnim, how does your calendar looks-like for a meeting next week? I can send a when2meet link for next week if you are available. 2i2c eng resources should be available to meet next week as well.

@arnim
Copy link

arnim commented Jul 13, 2022

how does your calendar looks-like for a meeting next week?

Do you have a preferred day? Usually around this time would work for me well :)

@arnim
Copy link

arnim commented Jul 13, 2022

@damianavila Typically somewhere between 12:00 UTC and 23:59 UTC should work

@damianavila
Copy link
Contributor

OK, I have create a when2meet event so we can find a slot that works for everyone involved in this conversation: https://www.when2meet.com/?16135851-mYuiB

@consideRatio @yuvipanda @sgibson91, I would appreciate it if you can join this meeting. Please check the link and drop your availability. Thanks!

@damianavila damianavila moved this from Needs Shaping / Refinement to In progress in DEPRECATED Engineering and Product Backlog Jul 13, 2022
@arnim
Copy link

arnim commented Jul 19, 2022

@damianavila Should we say Thursday, 21th July, 2022 at 17:15 UTC (or your timezone)?

Video - Jitsi: https://meet.jit.si/DynamicImageBuilding

@damianavila
Copy link
Contributor

@armin, I was finalizing the survey of availability and I was going to propose Wed 20th at 17 UTC, instead.
Can you make it?
Btw, I can offer a 2i2c zoom room in the invite I am going to send if Zoom is OK for you.

@arnim
Copy link

arnim commented Jul 19, 2022

That's fine as well :)

@damianavila
Copy link
Contributor

Can you tell me the email to use in the invite? I will post the details here as well in case you do not want to share your email here.

@arnim
Copy link

arnim commented Jul 19, 2022

arnim dot bleier at gmail ...

@damianavila
Copy link
Contributor

Thank you (some people use dedicated calendars for meeting invites, this is why I explicitly asked).
Invitation sent! cc @2i2c-org/tech-team

@damianavila
Copy link
Contributor

Details.
When: Wed 20th 17 UTC
Where: 2i2c zoom room
Agenda (dropping some points that we can change if we want/need to):

  1. Quick presentation
  2. State of the problem
  3. Exploration/design phase
  4. Implementation
  5. Provision
  6. Status updates

@arnim
Copy link

arnim commented Jul 19, 2022

Can we send @MridulS an invite in case he wants to join?

@damianavila
Copy link
Contributor

@arnim, I sent you an invite to our Slack space.

@arnim
Copy link

arnim commented Jul 30, 2022

THX - just sent my hello message :)

@damianavila
Copy link
Contributor

Some updates:

  • Created a dedicated team (@consideRatio, @yuvipanda, and @sgibson91) who is assigned to push forward this project.
  • Created a dedicated Slack channel to have focused discussions about the project.
  • Created a dedicated GH project board to collect issues related to this project.
  • Coordinated and discuss next steps with the team.

We are starting with an exploratory phase consisting of:

  1. Series of technical meetings
  2. MVP to validate the ideas coming from those meetings
  3. MVP presentation in front of stakeholders (feedback loop)
  4. Iteration

Next step:

@damianavila
Copy link
Contributor

After the first meeting, we have some next steps briefly described here: #1577 (comment)

@arnim
Copy link

arnim commented Aug 5, 2022

Hi @damianavila, I thought that you would ping me first when you have some time to chat. I'm not sure we agreed to build on tljh in our first meeting.

@sgibson91
Copy link
Member

sgibson91 commented Aug 5, 2022

We are only deploying tljh-repo2docker as a means to perform user experience research to inform what we should build (but won't be the sole source of guidance) for, most likely, the z2jh helm chart.

@arnim
Copy link

arnim commented Aug 5, 2022

I'm also certainly interested in trying tljh-r2d and have to confess to my shame that I haven't done so yet :) Already subscribed to /issues/1596

@damianavila
Copy link
Contributor

damianavila commented Aug 5, 2022

Hi @damianavila, I thought that you would ping me first when you have some time to chat. I'm not sure we agreed to build on tljh in our first meeting.

For future readers, there was a conversation with @arnim in the specific binderhub-jupyterhub 2i2c Slack channel that further clarifies the context and the explanation @sgibson91 shared above: https://2i2c.slack.com/archives/C03RLNFM43F/p1659687870338259.

@arnim
Copy link

arnim commented Sep 7, 2022

Hi 👋 I'm back to the office this month. Should we have a small meeting, maybe next week? Let me know if there is anything I can do.

@arnim
Copy link

arnim commented Oct 12, 2022

We should also consider if the proposed new architecture allows for things to be used such as dynamic repository credentials (jupyterhub/binderhub/pull/1169) should they land in BHub or would this then better be integrated on the JHub side.

@damianavila
Copy link
Contributor

I think https://infrastructure.2i2c.org/en/latest/howto/features/github.html might help with this ⬆️.
There is a blog post from @yuvipanda at https://blog.jupyter.org/securely-pushing-to-github-from-a-jupyterhub-3ee42dfdc54f.

@arnim
Copy link

arnim commented Oct 25, 2022

Added a short link to the Jupyter community forum post on the Persistent BinderHub. The link includes a brief description of the deliverables for the collaboration, so that readers can form an idea of what to expect.

@damianavila damianavila moved this to In Progress in BinderHub-JupyterHub Nov 23, 2022
@choldgraf choldgraf changed the title Dynamic image building in a JupyterHub GESIS Collaboration - Dynamic image building in a JupyterHub Feb 1, 2023
@choldgraf
Copy link
Member Author

Hey all - I've added a "project roles" section to the top comment here and tagged folks that I think are dedicating their time to the project. Please edit if I got any of that wrong!

@damianavila
Copy link
Contributor

The last available update of the plan lives at 2i2c-org/binderhub-service#27.

@damianavila damianavila assigned jmunroe and unassigned choldgraf Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: In Progress
Development

No branches or pull requests

7 participants