Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Add support for custom environment with Jupyter #93

Open
mgoeminne opened this issue Jan 14, 2020 · 6 comments · Fixed by #112
Open

Add support for custom environment with Jupyter #93

mgoeminne opened this issue Jan 14, 2020 · 6 comments · Fixed by #112
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@mgoeminne
Copy link

Is your feature request related to a problem? Please describe.

No, it's a suggestion for improving the functional coverage of FADI.

Describe the solution you'd like

A data scientist can use Jupyter Hub for iteratively explore data sets and provide technical solutions to various problems.

In order to do so, she frequently has to change the Jupyter environment of her notebooks in order to include some specific package, to test alternative processing frameworks, etc. Typically, each project / use case can have one or many dedicated environments with daily or weekly undergoing changes.

FADI should foster such a dynamic adaptation of the data scientist's needs, by providing a way to efficiently manage extra dependencies.

For instance, a Web application could be provided for specifying, adapting or copying the environment right before instantiating Jupyter Hub. An interesting feature would be the possibility to inherit environments, and to share them among stakeholders.

Describe alternatives you've considered

The current recommanded way to do it is to adapt the Helm view file of the underlying Kubernetes cluster, and to restart the appropriate services. This is not really acceptable for a end user.

An alternative consists in specifying the additional dependencies in "conda install"-like commands at the beginning of the notebooks, but that makes these specifications notebooks-specific. It also implies the additional dependencies must be satisfied each time the notebook is loaded. Environment variables/secrets must be set in the notebooks, which raises securities issues. Etc, etc.

Additional context

Please have a look on how Domino provides this features. Basically, a Docker file can be edited by the finale user for personalizing the environment.

A nice optimization would consist in caching popular / recent / frequently used environments, in such a way running notebooks using these environments would be faster.

@mgoeminne mgoeminne added the enhancement New feature or request label Jan 14, 2020
@banzo banzo added this to the 0.1.2 milestone Jan 20, 2020
@alexnuttinck
Copy link
Contributor

Hello @mgoeminne,

Thanks for your feature request!

Do you think that BinderHub could meet your needs?

See the diagram of the BinderHub architecture: https://binderhub.readthedocs.io/en/latest/overview.html#a-diagram-of-the-binderhub-architecture

BinderHub seems to allow a user to create automatically a Jupyter Notebook based on a git repository. BinderHub generates a Docker image based on specifications, requirements made in the git repo.

This video explains very well how BinderHub works: https://www.youtube.com/watch?v=KcC0W5LP9GM

Tell us if you think that it makes senses to add BinderHub to FADI.

@banzo banzo assigned mgoeminne and unassigned banzo Feb 6, 2020
@mgoeminne
Copy link
Author

@alexnuttinck Thank you for your reactivity.I never used BinderHub, but it looks promizing.

However, I fear having to manage the specifications/requirements on Git repository limits the user experience, since she has to manage this repo. On the other hand, managing requirements by using a Git repository is pretty interesting, from the evolution/deployment management point of view.

BinderHub seems to be the perfect fit, since it allows to create environments from configuration files (Docker file, Python requirements, etc.) directly from the Jupyter Hub environment.

If BinderHub was systematically available, I would probably stop to complain & ask you devops guys about adding some weird dependencies to my environments 😄

@mgoeminne mgoeminne assigned alexnuttinck and unassigned mgoeminne Feb 6, 2020
@mgoeminne
Copy link
Author

As I understand it, this feature is practically mandatory for using Seldon without having admin access to the Kubernetes cluster.

@banzo
Copy link
Contributor

banzo commented Feb 28, 2020

@AyadiAmen

@AyadiAmen
Copy link
Contributor

I think it's possible to use the jupyter docker image jupyter/repo2docker with the current jupyterhub in fadi because repo2docker is the tool used by BinderHub to build images on demand.

jupyter-repo2docker is a tool to build, run, and push Docker images from source code repositories.

repo2docker fetches a repository (from GitHub, GitLab, Zenodo, Figshare, Dataverse installations, a Git repository or a local directory) and builds a container image in which the code can be executed. The image build process is based on the configuration files found in the repository.

The repo2docker doc comes with a how to use section, including the How to automatically create a environment.yml that works with repo2docker

@banzo banzo linked a pull request May 17, 2020 that will close this issue
@alexnuttinck
Copy link
Contributor

https://github.com/cetic/fadi/tree/develop/examples/binderhub doc is available on binderhub on the develop branch, it will be merged soon. Binderhub will remain nevertheless as a "beta" feature.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants