Skip to content

Commit

Permalink
Merge branch 'master' into req-pay
Browse files Browse the repository at this point in the history
  • Loading branch information
GeorgianaElena authored Feb 28, 2024
2 parents 2483813 + 0afaa47 commit ee4a7e7
Show file tree
Hide file tree
Showing 15 changed files with 350 additions and 10 deletions.
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/2_new-hub-provide-info.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ body:
label: Technical support contacts listed
description: |
Technical contacts are the folks who are authorized to open support tickets about this hub.
The source of truth is maintained in [this airtable](https://airtable.com/appxk7c9WUsDjSi0Q/tbl3CWOgyoEtuGuIw/viwtpo7RxkYv63hiD?blocks=hide).
The source of truth is maintained in [this airtable](https://airtable.com/appbjBTRIbgRiElkr/pagog9egWRpeLyLGW?uMtyG=b%3AWzAsWyI4WnV1VyIsOSxbInNlbDRjSG1pdlRuRlRza0ljIl0sIklPZVJqIl1d).
Validate that there is at least one technical contact listed in the airtable for this hub. If not, ping
partnerships to ensure they find out who that is and fill that information in.
Expand Down
7 changes: 4 additions & 3 deletions .github/ISSUE_TEMPLATE/3_decommission-hub.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,18 +40,19 @@ Usually, it is because it was a hub that we created for a workshop/conference an
- [ ] Remove the hub deployment
- `helm --namespace HUB_NAME delete HUB_NAME`
- `kubectl delete namespace HUB_NAME`
- TIP: Run `deployer use-cluster-credentials <cluster_name>` before running the above commands

#### Phase III - Cluster Removal

_This phase is only necessary for single hub clusters._

- [ ] Remove the cluster's datasource from the central Grafana with:
- `deployer grafana central-ds remove CLUSTER_NAME`
- `deployer grafana central-ds remove <cluster_name>`
- [ ] Run `terraform plan -destroy` and `terraform apply` from the [appropriate workspace](https://infrastructure.2i2c.org/en/latest/topic/terraform.html#workspaces), to destroy the cluster
- [ ] Delete the terraform workspace: `terraform workspace delete <NAME>`
- [ ] Delete the terraform values file under the `projects` folder associated with the relevant cloud provider (e.g. `terraform/gcp/projects/` for GCP)
- [ ] Remove the associated `config/clusters/<cluster_name>` directory and all its contents
- Remove the cluster from CI:
- [ ] [`deploy-hubs.yaml`](https://github.com/2i2c-org/infrastructure/blob/HEAD/.github/workflows/deploy-hubs.yaml)
- [ ] [`validate-clusters.yaml`](https://github.com/2i2c-org/infrastructure/blob/HEAD/.github/workflows/validate-clusters.yaml)
- [ ] Remove the cluster from the list of grafana datasources at https://grafana.pilot.2i2c.cloud/datasources
- [ ] [`deploy-grafana-dashboards.yaml`](https://github.com/2i2c-org/infrastructure/blob/HEAD/.github/workflows/deploy-grafana-dashboards.yaml)
- [ ] Remove A record from Namecheap account
89 changes: 89 additions & 0 deletions config/clusters/2i2c-aws-us/itcoocean.values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,15 @@ jupyterhub:
oauth_callback_url: https://itcoocean.2i2c.cloud/hub/oauth_callback
allowed_organizations:
- Hackweek-ITCOocean:itcoocean-hackweek-2023
- nmfs-opensci:2i2c-demo
scope:
- read:org
Authenticator:
admin_users:
- eeholmes # Eli Holmes, Community representative
singleuser:
# Requested in https://2i2c.freshdesk.com/a/tickets/1320
defaultUrl: /lab
# shared-public for collaboration
# See https://github.com/2i2c-org/infrastructure/issues/2821#issuecomment-1665642853
storage:
Expand Down Expand Up @@ -241,6 +244,92 @@ jupyterhub:
mem_guarantee: 115.549G
mem_limit: 128G
cpu_guarantee: 15.0
# Requested in: https://2i2c.freshdesk.com/a/tickets/1320
- display_name: "Bring your own image"
description: Specify your own docker image (must have python and jupyterhub installed in it)
slug: custom
allowed_teams:
- Hackweek-ITCOocean:itcoocean-hackweek-2023
- nmfs-opensci:2i2c-demo
- 2i2c-org:hub-access-for-2i2c-staff
profile_options:
image:
display_name: Image
unlisted_choice:
enabled: True
display_name: "Custom image"
validation_regex: "^.+:.+$"
validation_message: "Must be a publicly available docker image, of form <image-name>:<tag>"
kubespawner_override:
image: "{value}"
choices: {}
resource_allocation:
display_name: Resource Allocation
choices:
mem_1_9:
display_name: 1.9 GB RAM, upto 3.7 CPUs
kubespawner_override:
mem_guarantee: 1992701952
mem_limit: 1992701952
cpu_guarantee: 0.234375
cpu_limit: 3.75
node_selector:
node.kubernetes.io/instance-type: r5.xlarge
default: true
mem_3_7:
display_name: 3.7 GB RAM, upto 3.7 CPUs
kubespawner_override:
mem_guarantee: 3985403904
mem_limit: 3985403904
cpu_guarantee: 0.46875
cpu_limit: 3.75
node_selector:
node.kubernetes.io/instance-type: r5.xlarge
mem_7_4:
display_name: 7.4 GB RAM, upto 3.7 CPUs
kubespawner_override:
mem_guarantee: 7970807808
mem_limit: 7970807808
cpu_guarantee: 0.9375
cpu_limit: 3.75
node_selector:
node.kubernetes.io/instance-type: r5.xlarge
mem_14_8:
display_name: 14.8 GB RAM, upto 3.7 CPUs
kubespawner_override:
mem_guarantee: 15941615616
mem_limit: 15941615616
cpu_guarantee: 1.875
cpu_limit: 3.75
node_selector:
node.kubernetes.io/instance-type: r5.xlarge
mem_29_7:
display_name: 29.7 GB RAM, upto 3.7 CPUs
kubespawner_override:
mem_guarantee: 31883231232
mem_limit: 31883231232
cpu_guarantee: 3.75
cpu_limit: 3.75
node_selector:
node.kubernetes.io/instance-type: r5.xlarge
mem_60_6:
display_name: 60.6 GB RAM, upto 15.7 CPUs
kubespawner_override:
mem_guarantee: 65094813696
mem_limit: 65094813696
cpu_guarantee: 7.86
cpu_limit: 15.72
node_selector:
node.kubernetes.io/instance-type: r5.4xlarge
mem_121_2:
display_name: 121.2 GB RAM, upto 15.7 CPUs
kubespawner_override:
mem_guarantee: 130189627392
mem_limit: 130189627392
cpu_guarantee: 15.72
cpu_limit: 15.72
node_selector:
node.kubernetes.io/instance-type: r5.4xlarge
kubespawner_override:
cpu_limit: null
mem_limit: null
Expand Down
13 changes: 11 additions & 2 deletions config/clusters/2i2c-uk/lis.values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,22 @@ jupyterhub:
funded_by:
name: London Interdisciplinary School
url: https://www.lis.ac.uk
# Extra mount point for admins to access to all users' home dirs
# Ref https://2i2c.freshdesk.com/a/tickets/228
singleuserAdmin:
extraVolumeMounts:
# /allusers is an extra mount point for admins to access to all users'
# home dirs, ref: https://2i2c.freshdesk.com/a/tickets/228.
- name: home
mountPath: /home/jovyan/allusers
readOnly: false
# mounts below are copied from basehub's values that we override by
# specifying extraVolumeMounts (lists get overridden when helm values
# are combined)
- name: home
mountPath: /home/jovyan/shared-readwrite
subPath: _shared
- name: home
mountPath: /home/rstudio/shared-readwrite
subPath: _shared
singleuser:
image:
# https://hub.docker.com/r/lisacuk/lishub-base
Expand Down
9 changes: 9 additions & 0 deletions config/clusters/2i2c/climatematch.values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,15 @@ jupyterhub:
- name: home
mountPath: /home/jovyan/allusers
readOnly: true
# mounts below are copied from basehub's values that we override by
# specifying extraVolumeMounts (lists get overridden when helm values
# are combined)
- name: home
mountPath: /home/jovyan/shared-readwrite
subPath: _shared
- name: home
mountPath: /home/rstudio/shared-readwrite
subPath: _shared
2i2c:
add_staff_user_ids_to_admin_users: true
add_staff_user_ids_of_type: "github"
Expand Down
9 changes: 9 additions & 0 deletions config/clusters/leap/common.values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,15 @@ basehub:
- name: home
mountPath: /home/jovyan/allusers
readOnly: true
# mounts below are copied from basehub's values that we override by
# specifying extraVolumeMounts (lists get overridden when helm values
# are combined)
- name: home
mountPath: /home/jovyan/shared-readwrite
subPath: _shared
- name: home
mountPath: /home/rstudio/shared-readwrite
subPath: _shared
2i2c:
add_staff_user_ids_to_admin_users: true
add_staff_user_ids_of_type: "github"
Expand Down
1 change: 1 addition & 0 deletions docs/sre-guide/support/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ There is also a wiki with [per-cluster support notes](https://github.com/2i2c-or
```{toctree}
:maxdepth: 2
home-dir
simple-python-package
decrypt-age
build-image-remotely
credits
Expand Down
175 changes: 175 additions & 0 deletions docs/sre-guide/support/simple-python-package.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
# Add a simple python package to an image we maintain

This runbook describes the steps folks can take when they receive a request for
adding a python package to an image we maintain for a community. Most requests are for
a simple, uncomplicated package addition. This guide helps you determine if the request
is simple, and if so, complete it.

## Pre-requisites

1. We (2i2c) are responsible for maintaining this image. There is currently no single source of
truth for determining this, unfortunately - please ask in the `#partnerships` channel if you
are not sure.

```{Note}
If we do not maintain the image for a community, we should have a template response to be
sent back here.
```

2. The image is maintained on a GitHub repo we have full rights on.

3. The image is built and pushed with [repo2docker-action](https://github.com/jupyterhub/repo2docker-action)

4. The request is for a python package.

5. The image is constructed in one of the following ways:
a. It is using repo2docker files, and has an `environment.yml` file
b. It is inheriting from one of the following community maintained upstream images via a `Dockerfile`.
i. [jupyter/docker-stacks](https://github.com/jupyter/docker-stacks)
ii. [pangeo-docker-images](https://github.com/pangeo-data/pangeo-docker-images/)

If *any* of these pre-requisites are not met, go straight to [escalation](sre-guide:support:simple-python-package:escalation).

## Determine if this is a 'simple' package addition

### Q1: Is the python package being requested available on `pip` or the `conda-forge` channel?

- [conda-forge search](https://anaconda.org/search). Verify that the package is in the `conda-forge` channel here, rather than in any other channel.
- [pypi search](https://pypi.org/)

#### No

- The package is from GitHub -> Tell the requester to release it on PyPI, and we
can install it from there. In the meantime, they can test it within their
environment by just `pip install`ing from their github repo.

- Package is available in a non-conda-forge channel ->
[Escalate](sre-guide:support:simple-python-package:escalation) to rest of the
team, as mixing conda channels can get messy and complex.

#### Yes

Go to Q2.

### Q2: Is this package part of the ML ecosystem?

There are two distinct ML ecosystems in python - based on
[tensorflow](https://www.tensorflow.org/) and [pytorch](https://pytorch.org/).
Does the package depend transitively on either of these packages?

#### Check dependencies on `pip`

If we are installing from `PyPI` via `pip`, you can check transitive dependencies
via the excellent [libraries.io](https://libraries.io).

1. Go to the [libraries.io/pypi](https://libraries.io/pypi/) page - this collects
and provides many useful pieces of information about packages on PyPI.
2. Search for the name of the package, and open its page.
3. In the right sidebar, under 'Dependencies', click 'Explore' dependencies.
This should take you to a dependency tree page, showing all dependencies
(including transitive dependencies). Here is what that looks
like for [pymc3](https://libraries.io/pypi/pymc3/3.11.5/tree).
4. Search for `tensorflow` or `torch` (the package name for pytorch) here.

#### Check dependencies on `conda`

If the package is in `conda-forge` and you have [mamba](https://mamba.readthedocs.io)
locally installed, you can use the [mamba repoquery](https://mamba.readthedocs.io/en/latest/user_guide/mamba.html#repoquery)
command. For example, to find all the dependencies of `pymc`, you would run:

```bash
mamba repoquery depends -c conda-forge pymc --tree
```

This should show you all the transitive dependencies

#### No

Yes, this is a *simple* package addition. Proceed to implementation.

#### Yes

Go to Q3

### Q3: Is the base package (tensorflow or pytorch) already installed in the image?

#### Yes

Yes, this is a *simple* package addition. Proceed to implementation.

#### No

No, this is not a simple package addition.
[Escalate](sre-guide:support:simple-python-package:escalation) to the rest of
the team, to help choose between:

1. Adding ML packages to existing image
2. Suggesting the community to use a different image as part of a `profileList`
3. Suggesting a new hub be deployed for ML use cases

## Implementing a simple package addition

### Guidelines for choosing conda-forge vs pypi

1. If the package is ML related, and the base package (tensorflow or pytorch) is
already present in the image, use the same installation method (conda-forge or
PyPI) that the base package uses. This reduces intermixing of dependencies,
which may cause breakage.
2. If the package is present on conda-forge, prefer that over PyPI

*If* there is an `environment.yml` file present, add the package there. If
*getting from `conda-forge`, it goes under the `dependencies`. If we are getting
*this from `PyPI`, it goes under the `pip` section under `dependencies`.

### Determine the latest version & pin to latest minor version.

**Ideally**, we will use a lock file for each image we maintain to have perfect
*pinning. However, we currently do not have that. Until then, we should use pin
*to the latest minor version of the requested package. So if the latest version
*is `2.0.5`, we can specify `==2.0.*` as the version constraint. While this
*still allows for versions of *dependent* packages to drift during rebuilds, it
*at least pins the *directly requested package* to an acceptable level (compared
*to not specifying a version at all).

You can find the current latest version from either PyPI or `conda-forge` (depending
on where it is being installed from, per the previous step).

### Provenance

Add a comment linking back to the support ticket where this package was requested.

### Does the build succeed?

We use [repo2docker-action](https://github.com/jupyterhub/repo2docker-action) to build and test PRs made to image repos. If the package can be successfully resolved and installed given our version constraints, the PR will have a successful build.

#### Yes

You can self merge the PR and roll it out to staging for the requester to test. The following response template may be used:

> Hello {{ name of requester }}
>
> We have installed the package you requested via {{ link to PR }}, and I have rolled it out to the staging hub at {{ link to staging hub }}. Can you test it out and let me know if it looks good? If so, I can roll it out to production.
>
> Thanks!
#### No

[Escalate](sre-guide:support:simple-python-package:escalation) to the whole team
so this can be debugged. We should communicate this escalation to the requester as well.
The following template may be used:

> Hello {{ name of requester }}
>
> We tried to add the package you requested in {{ link to PR }}. However, it looks like the package addition is not simple, and the build has failed. I've escalated this to our general engineering prioritization process, and we will get back to you once we have more information. Thank you for your patience!
>
> Thanks!
(sre-guide:support:simple-python-package:escalation)=
## Escalation

If this is *not* a simple package installation, escalate this to rest of engineering in the following way:

1. If it doesn't already exist, create a [freshdesk tracking issue](https://github.com/2i2c-org/infrastructure/issues/new?assignees=&labels=support&projects=&template=5_freshdesk-ticket.yml&title=%5BSupport%5D+%7B%7B+Ticket+name+%7D%7D)
in the `2i2c-org/infrastructure` repository. Make sure to fill in whatever you have learnt so
far.
2. Raise this in the `#support-freshdesk` channel on slack for further help and action.
9 changes: 9 additions & 0 deletions docs/topic/infrastructure/storage-layer.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,15 @@ jupyterhub:
mountPath: /home/jovyan/allusers
# Uncomment the line below to make the directory readonly for admins
# readOnly: true
# mounts below are copied from basehub's values that we override by
# specifying extraVolumeMounts (lists get overridden when helm values
# are combined)
- name: home
mountPath: /home/jovyan/shared-readwrite
subPath: _shared
- name: home
mountPath: /home/rstudio/shared-readwrite
subPath: _shared
```
#### A `shared-public` directory
Expand Down
Loading

0 comments on commit ee4a7e7

Please sign in to comment.