Dandihub

This Terraform blueprint creates a Kubernetes environment (EKS) and installs JupyterHub. Based on AWS Data on EKS JupyterHub. For more information, see our Architecture documentation.

Prerequisites
AWS Cloud Configuration for Terraform
AWS CLI Configuration for Multiple Accounts and Environments
- Step 1: Set Up AWS Credentials
- Step 2: Set Up AWS Config
Environment Variables
Variables File
Github OAuth
Deployment
Cleanup
Update
Route the Domain in Route 53
Manual Takedown of Just the Hub
Adding Admins to EKS
Adjusting Available Server Options
Adjusting Available Nodes
Adjusting Core Node
Upgrading Kubernetes
Kubernetes Layer Tour

Prerequisites

This guide assumes that you have:

A registered domain
An AWS Certificate for the domain and subdomains
An AWS IAM account (Trust Policy to assume JupyerhubProvisioningRole, or Admin if Role has not been created).
Terraform >= 1.8.3 (installation guide)
kubectl >= 1.26.15 (installation guide)
yamllint >= 1.35.1 (installation guide

Directory Layout

The project directory is structured to separate environment-specific configurations from the main Terraform configuration. This allows for easier management and scalability when dealing with multiple environments. Each deployment is given its own directory in envs/.

AWS Cloud Configuration for Terraform

This document explains how to set up the necessary AWS resources and configurations for using Terraform to provision JupyterHub.

1. Create and Configure an S3 Bucket for Terraform State Storage

Create an S3 Bucket:
- Go to the S3 console in AWS.
- Click "Create bucket".
- Name the bucket jupyterhub-terraform-state-bucket (ensure the name is unique per AWS account).
- Choose the region us-east-2.
- Enable default encryption.
- Create the bucket.

Configure Terraform to Use the S3 Bucket:

In the envs/<deployment> directory, create a file named backend.tf with the following content:

bucket         = "jupyterhub-terraform-state-bucket"
key            = "terraform.tfstate"
region         = "us-east-2"
encrypt        = true
dynamodb_table = "jupyterhub-terraform-lock-table"

2. Set Up DynamoDB for State Locking

Create a DynamoDB Table:
- Go to the DynamoDB console in AWS.
- Click "Create table".
- Name the table jupyterhub-terraform-lock-table.
- Set the primary key to LockID (String).
- Create the table.

3. Set Up IAM Roles and Policies

Create an IAM Role:
- Go to the IAM console in AWS.
- Click "Roles" and then "Create role".
- Choose AWS service and select Custom trust policy
Set Up the Trust Policy:
- Edit the trust relationship for the JupyterhubProvisioningRole role to allow the necessary entities to assume the role. Copy and paste below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<account>:root"
            },
            "Action": "sts:AssumeRole",
              "Condition": {
                "StringEquals": {
                    "aws:PrincipalType": "User"
                }
            }
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "ec2.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "lambda.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Create and attach inline policies
- From the JupyterhubProvisioningRole under the permissions tab, select create inline policy
- From the JSON tab, create terraform-jupyterhub-backend-policies using the json in .aws
- From the JSON tab, create terraform-jupyterhub-provisioning-policies using the json in .aws
Set Maximum Session Duration
- 1 hour is usually sufficient, but will occassionally fail.
- Recommend 4 hours.

AWS CLI Configuration for Multiple Accounts and Environments

To manage multiple AWS accounts and environments, you need to configure your AWS CLI with the appropriate profiles. Follow the steps below to set up your ~/.aws/config and ~/.aws/credentials files.

Step 1: Set Up AWS Credentials

Obtain Your AWS Access Keys:
- Log in to the AWS Management Console.
- Navigate to the IAM service.
- Select Users and click on your user name.
- Go to the Security credentials tab.
- Click Create access key and note down the Access key ID and Secret access key.

Edit Your .aws/credentials File:

Open the .aws/credentials file in your home directory. If it doesn't exist, create it.
Add your access keys for each profile:

[mcgovern]
aws_access_key_id = YOUR_MCGOVERN_ACCESS_KEY_ID
aws_secret_access_key = YOUR_MCGOVERN_SECRET_ACCESS_KEY

[bican]
aws_access_key_id = YOUR_BICAN_ACCESS_KEY_ID
aws_secret_access_key = YOUR_BICAN_SECRET_ACCESS_KEY

Step 2: Set Up AWS Config

Obtain Your Role ARN:
- Log in to the AWS Management Console.
- Navigate to the IAM service.
- Select Roles and find the role you will assume (e.g., JupyterhubProvisioningRole).
- Note down the Role ARN.

Edit Your .aws/config File:

Open the .aws/config file in your home directory. If it doesn't exist, create it.
Add the region, role ARN, and source profile for each environment. Here’s an example:

[profile mcgovern]
region = us-east-2
role_arn = arn:aws:iam::MCGOVERN_ACCOUNT_ID:role/JupyterhubProvisioningRole
source_profile = mcgovern

[profile bican]
region = us-east-2
role_arn = arn:aws:iam::BICAN_ACCOUNT_ID:role/JupyterhubProvisioningRole
source_profile = bican

Environment Variables

Environment variables store secrets and hub deployment name:

AWS_PROFILE: The profile for the AWS account to deploy to, see AWS config above.
TF_VAR_github_client_id: See Github OAuth Step.
TF_VAR_github_client_secret: See Github OAuth Step.
TF_VAR_aws_certificate_arn: See Create Cert Step.
TF_VAR_dandi_api_credentials JSON: keys are the domain that hosts a DANDI API, values are tokens to match the DANDI API.

Example of dandi_api_credentials:

'{"https://api.dandiarchive.org": "my_dandi_token", "https://api-dandi.emberarchive.org":"my_ember_token"}'

Variables File

The variables are set in a terraform.tfvars for each env, ie envs/dandi/terraform.tfvars

name: (optional, defaults to jupyerhub-on-eks)
singleuser_image_repo: Dockerhub repository containing custom jupyterhub image
singleuser_image_tag: tag
jupyterhub_domain: The domain to host the jupyterhub landing page: (ie "hub.dandiarchive.org")
region: Cloud vendor region (ie us-west-1)

WARNING: If changing region it must be changed both in the tfvars and in the backend.tf.

Jupyterhub Configuration

JupyterHub is configured by merging two YAML files:

envs/shared/jupyterhub.yaml
envs/$ENV/jupyterhub-overrides.yaml

Env Minimum Requirements:

hub.config.Authenticator.admin_users

This template is configuration for the jupyterhub helmchart administrator guide for jupyerhub.

The jupyterhub.yaml and jupyterhub-overrides.yaml can use ${terraform.templating.syntax} with values that are explicitly passed to the jupyterhub_helm_config template object in addons.tf

The original AWS Jupyterhub Example Blueprint docs may be helpful.

Merge Strategy:

Additive: New fields are added.
Clobbering: Existing values, including lists, are overwritten.

example

Base Configuration (envs/shared/jupyterhub.yaml)

singleuser:
  some_key: some_val
  profileList:
    - item1
    - item2

Override Configuration (envs/$ENV/jupyterhub-overrides.yaml)

singleuser:
  new_key: new_val
  profileList:
    - item3

Resulting Configuration

singleuser:
  some_key: some_val
  new_key: new_val
  profileList:
    - item3

Github OAuth

Open the GitHub OAuth App Wizard: GitHub settings -> Developer settings -> OAuth Apps. For dandihub, this is owned by a bot GitHub user account (e.g. dandibot).
Create App:

Homepage URL to the site root (e.g., https://hub.dandiarchive.org). Must be the same as jupyterhub_domain.
Authorization callback URL must be <jupyterhub_domain>/hub/oauth_callback.

Deployment

Execute install script

./install.sh <env>

Common deployment issues

Timeouts and race conditions Context Deadline Exceeded: This just happens sometimes, usually resolved by rerunning the install script.

Key Management Service Duplicate Resource This is usually caused by a problem with tfstate, it can't be immediately fixed because Amazon Key Management Service objects have a 7-day waiting period to delete. The workaround is to change/add a name var to the tfvars (ie jupyerhub-on-eks-2) Mark the existing KMS for deletion. You will need to assume the AWS IAM Role used to create it (ie JupyterhubProvisioningRole)

Show config of current jupyterhub deployment

Warning: This is the fully templated jupyterhub. Be careful not to expose secrets. helm get values jupyterhub -n jupyterhub

Connect Jupyterhub proxy to DNS

Route the Domain in Route 53

In Route 53 -> Hosted Zones -> <jupyterhub_domain> create an A type Record that routes to an Alias to Network Load Balancer. Set the region and the EXTERNAL_IP of the service/proxy-public Kubernetes object in the jupyterhub namespace.

This will need to be redone each time the proxy-public service is recreated (occurs during ./cleanup.sh).

Update

Changes to variables or the template configuration usually are updated idempotently by running ./install.sh <env> without the need to cleanup prior.

Cleanup

Prior to cleanup ensure that kubectl and helm are using the appropriate kubeconfig. (<name> is the value name in terraform.tfvars.)

aws eks --region us-east-2 update-kubeconfig --name <name-prefix>

Cleanup requires the same variables and is run ./cleanup.sh <env>.

NOTE: Occasionally the Kubernetes namespace fails to delete.

WARNING: Sometimes AWS VPCs are left up due to an upstream Terraform race condition and must be deleted by hand (including hand-deleting each nested object).

Take Down Jupyterhub, leave up EKS

terraform destroy -target=module.eks_data_addons.helm_release.jupyterhub -auto-approve will destroy all the jupyterhub assets, but will leave the EKS and VPC infrastructure intact.

Adding Admins to EKS

Add the user/IAM to mapUsers.

kubectl edit configMap -n kube-system aws-auth

apiVersion: v1
data:
  mapAccounts: <snip>
  mapRoles: <snip>
  mapUsers: |
    - groups:
      - system:masters
      userarn: arn:aws:iam::<acct_id>:user/<iam_username>
      username: <iam_username>
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system

Adjusting Available Server Options

These are the options for user-facing machines that run as a pod on the node, and they are configured in profileList in dandihub.yaml.

Each profile can have multiple user-facing profile_options including images.

Adjusting Available Nodes

These are the EKS machines that may run underneath one or more user-hub pods, and they are configured via Karpenter.

The node pools are configured in addons.tf with karpenter-resources-* objects.

Adjusting Core Node

The configuration for the machines that run the autoscaling and monitoring layer is eks_managed_node_groups in main.tf.

Upgrading Kubernetes

⚠️ Warning: Do not click the "Upgrade k8s" button in AWS Console. When AWS manages upgrade it will go slowly and upgrade components carefully to avoid downtime. When the upgrade is finished however, on the next run tfstate will not match and terraform will destroy the cluster and bring it back up.

Kubernetes version is controlled via the terraform variable eks_cluster_version, the default is in versions.tf, but each deployment can specify their own value in their tfvars.

Kubernetes Layer Tour

Jupyterhub Namespace

These objects are created by z2jh.

https://z2jh.jupyter.org/en/stable/

kubectl get all -n jupyterhub

Notable objects:

pod/hub-23490-393: Jupyterhub server and culler pod
pod/jupyter-<github_username>: User pod
pod/user-scheduler-5d8b9567-26x6j: Creates user pods. There are two; one has been elected leader, with one backup.
service/proxy-public: LoadBalancer, External IP must be connected to DNS (Route 53)

Karpenter Namespace

pod/karpenter-75fc7784bf-cjddv responds similarly to the cluster-autoscaler.

When Jupyterhub user pods are scheduled and sufficient Nodes are not available, Karpenter creates a NodeClaim and then interacts with AWS to spin up machines.

nodeclaims: Create a node from one of the Karpenter Nodepools. (This is where spot/on-demand is configured for user-pods).

Monitoring Disk Usage

DANDI Hub provides persistent storage to each user, but over time the data stored can become expensive.

To run a job to gather disk usage per user, start by setting the configuring the aws cli (make sure AWS_PROFILE env var is set). You will also need to set EC2_SSH_KEY to the location of the PEM file for the dandihub-gh-actions keypair (see asmacdo).

Launch an ec2 instance with the appropriate tools and access:

./.github/scripts/launch-ec2.sh

NOTE: If this does not succeed, the security group may have changed, and if so the extra rules necessary for this instance will need to be put back into place. On the SG for eks-dandihub-efs add an inbound rule for NFS, pointing to the SG of the ec2 instance.

When the script completes, it will provide instructions to ssh into the instance.

Once logged into the instance, it is recommended to start a screen session.

screen -S create-file-index

Next, navigate to the EFS dir which contains each user homedir, and create a file index for each user.

cd /mnt/efs/home/
parallel -j 8 ~/scripts/create-file-index.py ::: *

Once finished, navigate to the output location of the file index script and generate the totals.

cd /home/ec2-user/hub-user-indexes
~/scripts/calculate-directory-stats.py

Logout of the ec2 instance, and pull the totals locally.

scp -i "$EC2_SSH_KEY" ec2-user@"$PUBLIC_IP":/home/ec2-user/hub-user-reports/all_users_total.json .

Finally, remove the ec2 instance.

./.github/scripts/cleanup-ec2.sh

Name		Name	Last commit message	Last commit date
Latest commit History 577 Commits
.aws		.aws
.github		.github
doc		doc
envs		envs
helm		helm
images		images
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
addons.tf		addons.tf
backend.tf		backend.tf
cleanup.sh		cleanup.sh
install.sh		install.sh
jupyterhub.tf		jupyterhub.tf
main.tf		main.tf
outputs.tf		outputs.tf
providers.tf		providers.tf
variables.tf		variables.tf
versions.tf		versions.tf
vpc.tf		vpc.tf

License

dandi/dandi-hub

Folders and files

Latest commit

History

Repository files navigation

Dandihub

Table of Contents

Prerequisites

Directory Layout

AWS Cloud Configuration for Terraform

1. Create and Configure an S3 Bucket for Terraform State Storage

2. Set Up DynamoDB for State Locking

3. Set Up IAM Roles and Policies

AWS CLI Configuration for Multiple Accounts and Environments

Step 1: Set Up AWS Credentials

Step 2: Set Up AWS Config

Environment Variables

Variables File

Jupyterhub Configuration

Github OAuth

Deployment

Common deployment issues

Connect Jupyterhub proxy to DNS

Update

Cleanup

Take Down Jupyterhub, leave up EKS

Adding Admins to EKS

Adjusting Available Server Options

Adjusting Available Nodes

Adjusting Core Node

Upgrading Kubernetes

Kubernetes Layer Tour

Jupyterhub Namespace

Karpenter Namespace

Monitoring Disk Usage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 17

Uh oh!

Languages