Nebari Distribution Proposal #2858

Adam-D-Lewis · 2024-11-19T18:49:38Z

Adam-D-Lewis
Nov 19, 2024
Maintainer

In Quansight, we've been talking about the idea of a Nebari distribution. I think the first thing we should do is to define what a distribution is. Here's my proposed definition below.

What is a distribution?

A nebari distribution is a tested set of configuration that is known to work and is usually geared towards a specific use case (see examples below).

Users deploying nebari can feel confident that if they deploy a distribution, it will be a good starting point for them.

Distribution Examples

For the sake of discussion, these are examples of distributions that should be possible.

low cost nebari - defined by cheaper node types, disabling some optional features in nebari config, etc.
gpu nebari - defined by gpu node types and profiles
ML nebari - defined by gpu node types/profiles, ML conda environments, nebari-mlflow-plugin
Ragna app - automatically deploys ragna app on Nebari on initial deployment along with conda environment and a workflow for downloading a local LLM model
AI Inference - ability to deploy an inference server (e.g. vLLM, Aphrodite, others)
AI QLoRa Fine Tune - Train a QLoRa on an LLM
AI Optimization - Quantization of a model

Supported Features of a Distribution

As motivated by the distribution examples, a distribution should be able to define the following:

Nebari Config Sets*
- Any option in the nebari config file.
- e.g. this would be needed to set up gpu node types and jupyter profiles
Conda environments
- environments created during nebari deployment. These could be specified in multiple ways e.g. conda project, conda env file, etc.
Workflows**
- workflows which run conditionally (e.g. initial deployment only, if file doesn't exist, etc.) during deployment. These could be specified in multiple ways e.g. conda project, argo, k8s job, etc.
- used to download the LLM model weights in the ragna example distribution
Jhub Apps:
- apps deployed during nebari deployment
- these could be specified in multiple ways e.g. conda project or direct definition
Nebari plugins:
- Nebari plugins which should be installed and included in the deployment
- e.g. needed by the ML distribution of Nebari.

* Custom docker images could be specified as part of this.
**I could see us replacing Workflows (or supplementing it) with a "Downloads" feature which downloads files and puts them where specified in the NFS drive on initial deployment. It could use some type of workflow under the hood, but that would be an implementation detail.

How to define a distribution:

Assuming there is agreement with the above features I'll now discuss how one might define a distribution. Various options exist to create a distribution:

Nebari distribution plugin which is only different from current nebari plugins in that we'd add new pluggy hooks in the Nebari codebase enabling the functionality above. These distribution plugins could have dependencies on other non-distribution plugins (e.g. Nebari MLflow plugin)
yaml file
other?

Nebari Plugin defined Distribution

I see a Nebari plugin as very robust and capable choice, but it is more difficult to create a distribution since you have to create a python package. I don't see that difficulty as an issue since I think anyone considering creating a Nebari plugin is capable of creating a python package or learning how to do so. This also allows us to specify dependencies on nebari and other plugins in the usual manner (pyproject.yaml).

Yaml File defined Distribution

I see a yaml file as not robust enough and likely to result in us trying to create a DSL in yaml to handle many different distribution needs. Nevertheless, I've tried to define below for the sake of argument.

distro_schema_version: 1.0
supported_nebari_versions: >=2024.11.1  # **Con: More clumsy than just using pyproject.yaml**
# anything specified overrides what nebari generates
config_options:

# **Con: You need a separate distribution for each cloud provider for gpus or else you need to start creating the yaml DSL to define conditional logic (please no)**
  - google_cloud_platform.node_groups:
    - gpu-tesla-t4-x1:
        instance: "n1-standard-16"
        min_nodes: 0
        max_nodes: 4
        guest_accelerators:
          - name: nvidia-tesla-t4      # 1x 16 GB GDDR6: Nividia Tesla T4
            count: 1
  - profiles:
    - jupyterlab:
      - display_name: T4 GPU Instance 2x
        description: 16 cpu / 60GB RAM / 2 Nvidia T4 GPU (32 GB GPU RAM)
        kubespawner_override:
          image: quay.io/nebari/nebari-jupyterlab-gpu:2024.5.1
          cpu_limit: 16
          ...
          node_selector:
            "cloud.google.com/gke-nodepool": "gpu-tesla-t4-x2"
  - conda_store.extra_settings.CondaStore:
      conda_allowed_channels: ["bioconda", "conda-forge"]
conda_environments:
  - https://github.com/nebari-dev/jhub-apps-from-git-repo-example/blob/main/conda-project.yml
  - PAT_ENV_VAR: MY_GITHUB_PAT
    url: https://github.com/nebari-dev/jhub-apps/blob/main/environment-dev.yml
jhub_apps:
  - https://github.com/myOrg/myRepo/tree/myBranch
workflows:
  - condition: # e.g. check if file exists on NFS drive
    argo:
      # ARGO_WORKFLOW_YAML_SPEC
  - condition: # e.g. check if file exists on NFS drive
    cron:
      # K8S_JOB_SPEC
  - condition:
    conda_project:
      path:  # url or path
      command: 

# **Con: we have to figure out what package manager they're using (pip, conda, poetry, etc?) and install it the same way or make it configurable**
nebari_plugins:
  - nebari-plugin-mlflow
  - https://github.com/myOrg/myNebariPlugin

The 3 major cons I've noted in the yaml file are:

Putting supported nebari version dependencies in the yaml file is more clumsy than just using pyproject.yaml
Imagine making a gpu distribution, you need a separate distribution for each cloud provider or else you need to start creating the yaml DSL to define conditional logic (if this provider then X else Y) (please no)
If your distribution requires other plugins then we have to figure out how to install those for the user in the same manner as Nebari which is troublesome since many options for installing python packages exist.

I think we should define nebari distributions in plugins b/c of those limitations of the yaml file approach.

smeragoel · 2024-11-21T18:13:35Z

smeragoel
Nov 21, 2024

Thanks for proposing this, @Adam-D-Lewis! This is an exciting idea, and I have some questions and feedback:

Users and Use Cases:

Who is the target user group for Nebari distributions? Are they intended primarily for beginners, advanced users, or a mix?
Is the goal to make deployment easier, faster, or both?

Definition:

"Distributions" might feel too technical or unclear for some users. Could we consider using terms like "Setup Templates" or "Pre-Configured Setups" to make the concept more accessible?
How will the distributions be decided? Are they based on internal discussions, user input, or client requests?
How will updates to distributions be handled post-installation? For example if a distribution gets new features or fixes, will users be able to apply these updates?

Installation:

To make the installation process user friendly, we should:

Include a menu of options with descriptions of what each distribution is meant for ("Low-Cost: Minimizes resource usage" etc).
Have a default option (like "Standard Setup") for users unsure of what to pick.

Down the line I can also offer more specific design input for including distributions in the CLI and making the selection process more user-friendly and accessible, so feel free to ping me!

1 reply

Adam-D-Lewis Nov 21, 2024
Maintainer Author

Who is the target user group for Nebari distributions? Are they intended primarily for beginners, advanced users, or a mix?
Is the goal to make deployment easier, faster, or both?

Target user group would be those deploying Nebari (more of an admin than a user). The goal is to include specific attributes or functionality not available in the base deployment. Think of a buying a car. You have the base model, and then you have models with extra features (leather seats, built in screen, etc.). That's a similar idea. Distributions are a higher level interface to a lot of low level config options based around use cases I listed above (Distribution Examples).

"Distributions" might feel too technical or unclear for some users. Could we consider using terms like "Setup Templates" or "Pre-Configured Setups" to make the concept more accessible?

"Setup templates" doesn't capture the capability that distributions would provide. "Pre-configured Setups" seems closer to capturing what I'm aiming for, and I would consider using that name instead of "Distributions".

How will the distributions be decided? Are they based on internal discussions, user input, or client requests?

For distributions, there may need to be some modifications to the Nebari code base to make them possible, and the Nebari core team will likely decide to support some official distributions, but other distributions could be created by the wider Nebari community (anyone) similar to existing Nebari plugins and may choose to share or not share their distributions with the community. Not all distributions are built into Nebari. They would either need to be installed (e.g. pip install my-distribution) in the case of Nebari distribution defined as a plugin or stored as a set of yaml files if we decide to define distributions in yaml files. The yaml files could then be stored locally, in a git repo, etc.

How will updates to distributions be handled post-installation? For example if a distribution gets new features or fixes, will users be able to apply these updates?

I've focused on initial Nebari deployments so far, but I could foresee various options for updating a deployment. E.g. pip install the updated distribution plugin or update a reference to the distribution yaml file in the nebari config file and then redeploy.

dcmcand · 2024-11-22T15:58:10Z

dcmcand
Nov 22, 2024
Maintainer

@Adam-D-Lewis I like the idea of distributions.

I think you have a few concepts here.

First you have the setup templates that @smeragoel mentioned. I think this would be a great first step. Starting with a low-cost, standard, and HA starting template for GPU and nonGpu setups would be a great start. I recognize lowcost, gpu may be ridiculous, so we may need to tank that option. I think this could easily be added to the Nebari CLI where we provide a number of out of the box templates, but orgs can put in a url to fetch another config as a template if they want. Obviously we would need to document how to make a config template, but that seems like a relatively quick way to deliver value.

Second, plugins. This is what I had in mind when I talk about Nebari Spins. Preinstalled components focussed on a specific domain. ML Spin is the most obvious, but you could even break that down further. ML development, or ML hosting for example. We could also move dask into a plugin and make a distributed compute spin. I think it would make sense to build these as separate packages and publish them. If you download and install nebari-ml-dev then you get nebari with mlflow, lakefs, and anything else needed for ml dev preinstalled as plugins.

Third you have things running on existing Nebari services such as jhub apps. If you want to prepopulate a application in jhub apps, I think we might want to look at a "nebari cookbook" or something where we recipes for running particular apps. This may be where conda-project comes in.

In my head, it makes sense to approach things in the above order. First launch templates, then spins, finally app recipes (I am avoiding saying distribution to help keep these separate).

4 replies

Adam-D-Lewis Nov 22, 2024
Maintainer Author

First you have the setup templates that @smeragoel mentioned. I think this would be a great first step. Starting with a low-cost, standard, and HA starting template for GPU and nonGpu setups would be a great start. I recognize lowcost, gpu may be ridiculous, so we may need to tank that option. I think this could easily be added to the Nebari CLI where we provide a number of out of the box templates, but orgs can put in a url to fetch another config as a template if they want. Obviously we would need to document how to make a config template, but that seems like a relatively quick way to deliver value.

Yeah, this is the config options feature I've listed above. My only thing is I think code should be used to define this and not a serialization format like yaml. If it's code I think it should be a nebari plugin. I don't think it should be yaml b/c I can't think of yaml that would be flexible enough for all the logic we would want without it becoming overly complex.

Take the gpu settings for example, the best I could think of as a way to define what's needed in a single yaml file is with yaml + jinja as in the example below.

profiles:
{{ profiles | default([]) }}  # this adds the default profiles
{%- if provider == 'gcp' %}
  - display_name: T4 GPU Instance 2x
    description: 16 cpu / 60GB RAM / 2 Nvidia T4 GPU (32 GB GPU RAM)
    kubespawner_override:
      image: quay.io/nebari/nebari-jupyterlab-gpu:2024.5.1
      cpu_limit: 16
      cpu_guarantee: 14
      mem_limit: 60G
      mem_guarantee: 45G
      extra_pod_config:
        volumes:
        - name: "dshm"
          emptyDir:
            medium: "Memory"
            sizeLimit: "2Gi"
      extra_container_config:
        volumeMounts:
        - name: "dshm"
          mountPath: "/dev/shm"
      extra_resource_limits:
        nvidia.com/gpu: 2
      node_selector:
        "cloud.google.com/gke-nodepool": "gpu-tesla-t4-x2"
{%- elif provider == 'aws' %}
  # AWS specific profile would go here
{%- elif provider == 'azure' %}
  # Azure specific profile would go here
{%- endif %}

I don't think it's going to be robust enough to handle all logic we might want so I think we'll probably end up needing to define options in code/nebari plugin for those cases anyway.

Adam-D-Lewis Nov 22, 2024
Maintainer Author

Second, plugins. This is what I had in mind when I talk about Nebari Spins. Preinstalled components focussed on a specific domain. ML Spin is the most obvious, but you could even break that down further. ML development, or ML hosting for example. We could also move dask into a plugin and make a distributed compute spin. I think it would make sense to build these as separate packages and publish them. If you download and install nebari-ml-dev then you get nebari with mlflow, lakefs, and anything else needed for ml dev preinstalled as plugins.

I think a spin has to be more than just a set of plugins at least as plugins are currently defined. Currently plugin = extra deployment stage and/or extra cli command. For example, the ML spin probably ought to have some ML conda envs pre-configured, and the distributed compute spin ought to have dask envs ready to go. The mlspin would need to be able to add the gpu instances and profiles to the nebari config, and the dask spin would need to add dask profiles to the nebari config. It'd also be nice if they had a few notebooks explaining how to use mlflow and/or dask respectively. That could be done by the workflows/file download feature I listed above.

It just doesn't seem like a spin as defined here can do much for the use cases listed above without a bunch of additional manual work or without giving plugins new hooks so plugins can do more (which I'm not against btw).

Adam-D-Lewis Nov 22, 2024
Maintainer Author

Third you have things running on existing Nebari services such as jhub apps. If you want to prepopulate a application in jhub apps, I think we might want to look at a "nebari cookbook" or something where we recipes for running particular apps. This may be where conda-project comes in.

Can you expand on the nebari cookbook / recipe idea? This sounds like a list of instructions like a README on what is needed to set up e.g. Ragna. Is that right? The motivation for distributions is to make it easy to set up Nebari for a particular use case. We could already write a readme for setting everything up, and yes, that would be helpful, but doesn't provide the UX I'd like for the users to have when deploying e.g. Ragna. The ability to define jhub apps in the nebari config file as part of your deployment is generally useful anyway in my opinion. I don't see a reason to prevent jhub apps from being defined in the nebari config file.

dcmcand Nov 28, 2024
Maintainer

Third you have things running on existing Nebari services such as jhub apps. If you want to prepopulate a application in jhub apps, I think we might want to look at a "nebari cookbook" or something where we recipes for running particular apps. This may be where conda-project comes in.

Can you expand on the nebari cookbook / recipe idea? This sounds like a list of instructions like a README on what is needed to set up e.g. Ragna. Is that right? The motivation for distributions is to make it easy to set up Nebari for a particular use case. We could already write a readme for setting everything up, and yes, that would be helpful, but doesn't provide the UX I'd like for the users to have when deploying e.g. Ragna. The ability to define jhub apps in the nebari config file as part of your deployment is generally useful anyway in my opinion. I don't see a reason to prevent jhub apps from being defined in the nebari config file.

No, I wasn't just thinking of documentation. I was thinking more of a complete package that has everything needed to have an app up and running. That would mean having the correct infra defined, the correct conda envs available, the correct code, etc. I think this would likely mean either coming up with a new specification format or extending an existing one. Though actually thinking about this now, with some work it might be possible to just do this as a nebari plugin since plugins can extend the config.

Adam-D-Lewis · 2024-11-26T16:00:31Z

Adam-D-Lewis
Nov 26, 2024
Maintainer Author

I brought this discussion up in the Quansight internal nebari meeting today. There was broad support for plugins to define distributions for non-trivial distributions (e.g. ragna), but there was also support for being able to define config sets in plain yaml. To handle the case where different providers may have differences (like gpus for example), we could expect the folder structure to be similar to the following in order to not have to use a complex yaml structure. See image below.

An added benefit and point in favor of using plugins to define distributions is it would allow a distribution to define tests which could be run in CICD for tests that wouldn't apply to a base Nebari deployment. e.g. Check that mlflow is running

0 replies

marcelovilla · 2024-11-26T16:52:09Z

marcelovilla
Nov 26, 2024
Maintainer

It seems that some distributions are simple enough that having a (templated) yaml file would be enough, like for example a low-cost distribution where the only implications are selecting some predefined cheaper instances for the node groups. On the other hand, more complex distributions might need extra logic where a plugin would be necessary, like for example an ai-inference, which might need additional software and infrastructure resources.

Taking this into account, how would the process of selecting/installing a distribution look like from the user's perspective? If it's the yaml case, maybe pulling a config file from a GitHub repo during the nebari init command? If it's a plugin, would it mean that the user needs to install the plugin and deploy Nebari from that?

1 reply

Adam-D-Lewis Nov 26, 2024
Maintainer Author

I don't think the yaml file case could handle an entire distribution as explained above, but it could handle simple cases of the first requirement which I propose we call config sets interchangeable with the longer, more descriptive name pre-defined configuration sets, but I'm open to other names.

In the case of a yaml file, I propose we add the following to the nebari config. Then we could add a prompt to the guided-init to list the config sets found and ask the user which they'd like to apply. I'm open to having other CLI commands to update an existing nebari config as well, but I haven't thought through exactly how that'd look quite as much.

config_sets:
  - <link to git repo>
  - <local path>
  - etc.

I still think instead of only pure yaml we support jinja templated yaml b/c how do you specify whether a key-value pair in the config set should override or just append to a list or map in the nebari config?

Adam-D-Lewis · 2024-11-27T22:15:58Z

Adam-D-Lewis
Nov 27, 2024
Maintainer Author

See more details about config sets here

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nebari-dev

Nebari Distribution Proposal #2858

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

nebari-dev

Nebari Distribution Proposal #2858

Adam-D-Lewis Nov 19, 2024 Maintainer

What is a distribution?

Distribution Examples

Supported Features of a Distribution

How to define a distribution:

Nebari Plugin defined Distribution

Yaml File defined Distribution

Replies: 5 comments · 6 replies

smeragoel Nov 21, 2024

Users and Use Cases:

Definition:

Installation:

Adam-D-Lewis Nov 21, 2024 Maintainer Author

dcmcand Nov 22, 2024 Maintainer

Adam-D-Lewis Nov 22, 2024 Maintainer Author

Adam-D-Lewis Nov 22, 2024 Maintainer Author

Adam-D-Lewis Nov 22, 2024 Maintainer Author

dcmcand Nov 28, 2024 Maintainer

Adam-D-Lewis Nov 26, 2024 Maintainer Author

marcelovilla Nov 26, 2024 Maintainer

Adam-D-Lewis Nov 26, 2024 Maintainer Author

Adam-D-Lewis Nov 27, 2024 Maintainer Author

Adam-D-Lewis
Nov 19, 2024
Maintainer

Replies: 5 comments 6 replies

smeragoel
Nov 21, 2024

Adam-D-Lewis Nov 21, 2024
Maintainer Author

dcmcand
Nov 22, 2024
Maintainer

Adam-D-Lewis Nov 22, 2024
Maintainer Author

Adam-D-Lewis Nov 22, 2024
Maintainer Author

Adam-D-Lewis Nov 22, 2024
Maintainer Author

dcmcand Nov 28, 2024
Maintainer

Adam-D-Lewis
Nov 26, 2024
Maintainer Author

marcelovilla
Nov 26, 2024
Maintainer

Adam-D-Lewis Nov 26, 2024
Maintainer Author

Adam-D-Lewis
Nov 27, 2024
Maintainer Author