This Terraform blueprint creates a Kubernetes environment (EKS) and installs JupyterHub. Based on AWS Data on EKS JupyterHub.
- Prerequisites
- AWS Cloud Configuration for Terraform
- AWS CLI Configuration for Multiple Accounts and Environments
- Environment Variables
- Variables File
- Github OAuth
- Deployment
- Cleanup
- Update
- Route the Domain in Route 53
- Manual Takedown of Just the Hub
- Adding Admins to EKS
- Adjusting Available Server Options
- Adjusting Available Nodes
- Adjusting Core Node
- Upgrading Kubernetes
- Kubernetes Layer Tour
This guide assumes that you have:
- A registered domain
- An AWS Certificate for the domain and subdomains
- An AWS IAM account (Trust Policy to assume JupyerhubProvisioningRole, or Admin if Role has not been created).
- Terraform >= 1.8.3 (installation guide)
- kubectl >= 1.26.15 (installation guide)
- yamllint >= 1.35.1 (installation guide
The project directory is structured to separate environment-specific configurations from the main
Terraform configuration. This allows for easier management and scalability when dealing with
multiple environments. Each deployment is given its own directory in envs/
.
This document explains how to set up the necessary AWS resources and configurations for using Terraform to provision JupyterHub.
-
Create an S3 Bucket:
- Go to the S3 console in AWS.
- Click "Create bucket".
- Name the bucket
jupyterhub-terraform-state-bucket
(ensure the name is unique per AWS account). - Choose the region
us-east-2
. - Enable default encryption.
- Create the bucket.
-
Configure Terraform to Use the S3 Bucket:
- In the
envs/<deployment>
directory, create a file namedbackend.tf
with the following content:
bucket = "jupyterhub-terraform-state-bucket" key = "terraform.tfstate" region = "us-east-2" encrypt = true dynamodb_table = "jupyterhub-terraform-lock-table"
- In the
- Create a DynamoDB Table:
- Go to the DynamoDB console in AWS.
- Click "Create table".
- Name the table
jupyterhub-terraform-lock-table
. - Set the primary key to
LockID
(String). - Create the table.
-
Create an IAM Role:
- Go to the IAM console in AWS.
- Click "Roles" and then "Create role".
- Choose
AWS service
and selectCustom trust policy
-
Set Up the Trust Policy:
- Edit the trust relationship for the
JupyterhubProvisioningRole
role to allow the necessary entities to assume the role. Copy and paste below:
- Edit the trust relationship for the
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<account>:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"aws:PrincipalType": "User"
}
}
},
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
},
{
"Effect": "Allow",
"Principal": {
"Service": "lambda.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
-
Create and attach inline policies
- From the
JupyterhubProvisioningRole
under thepermissions
tab, selectcreate inline policy
- From the
JSON
tab, createterraform-jupyterhub-backend-policies
using the json in.aws
- From the
JSON
tab, createterraform-jupyterhub-provisioning-policies
using the json in.aws
- From the
-
Set Maximum Session Duration
- 1 hour is usually sufficient, but will occassionally fail.
- Recommend 4 hours.
To manage multiple AWS accounts and environments, you need to configure your AWS CLI with the appropriate profiles. Follow the steps below to set up your ~/.aws/config
and ~/.aws/credentials
files.
-
Obtain Your AWS Access Keys:
- Log in to the AWS Management Console.
- Navigate to the IAM service.
- Select Users and click on your user name.
- Go to the Security credentials tab.
- Click Create access key and note down the Access key ID and Secret access key.
-
Edit Your
.aws/credentials
File:- Open the
.aws/credentials
file in your home directory. If it doesn't exist, create it. - Add your access keys for each profile:
[mcgovern] aws_access_key_id = YOUR_MCGOVERN_ACCESS_KEY_ID aws_secret_access_key = YOUR_MCGOVERN_SECRET_ACCESS_KEY [bican] aws_access_key_id = YOUR_BICAN_ACCESS_KEY_ID aws_secret_access_key = YOUR_BICAN_SECRET_ACCESS_KEY
- Open the
-
Obtain Your Role ARN:
- Log in to the AWS Management Console.
- Navigate to the IAM service.
- Select Roles and find the role you will assume (e.g.,
JupyterhubProvisioningRole
). - Note down the Role ARN.
-
Edit Your
.aws/config
File:- Open the
.aws/config
file in your home directory. If it doesn't exist, create it. - Add the region, role ARN, and source profile for each environment. Here’s an example:
[profile mcgovern] region = us-east-2 role_arn = arn:aws:iam::MCGOVERN_ACCOUNT_ID:role/JupyterhubProvisioningRole source_profile = mcgovern [profile bican] region = us-east-2 role_arn = arn:aws:iam::BICAN_ACCOUNT_ID:role/JupyterhubProvisioningRole source_profile = bican
- Open the
Environment variables store secrets and hub deployment name:
AWS_PROFILE
: The profile for the AWS account to deploy to, see AWS config above.TF_VAR_github_client_id
: See Github OAuth Step.TF_VAR_github_client_secret
: See Github OAuth Step.TF_VAR_aws_certificate_arn
: See Create Cert Step.TF_VAR_danditoken
: API token for the DANDI instance used for user auth.
The variables are set in a terraform.tfvars
for each env
, ie envs/dandi/terraform.tfvars
name
: (optional, defaults to jupyerhub-on-eks)singleuser_image_repo
: Dockerhub repository containing custom jupyterhub imagesingleuser_image_tag
: tagjupyterhub_domain
: The domain to host the jupyterhub landing page: (ie "hub.dandiarchive.org")dandi_api_domain
: The domain that hosts the DANDI API with list of registered usersregion
: Cloud vendor region (ie us-west-1)
WARNING: If changing region
it must be changed both in the tfvars and in the backend.tf
.
JupyterHub is configured by merging two YAML files:
envs/shared/jupyterhub.yaml
envs/$ENV/jupyterhub-overrides.yaml
Env Minimum Requirements:
- hub.config.Authenticator.admin_users
This template is configuration for the jupyterhub helmchart administrator guide for jupyerhub.
The jupyterhub.yaml
and jupyterhub-overrides.yaml
can use ${terraform.templating.syntax}
with values that are explicitly passed to the jupyterhub_helm_config
template object
in addons.tf
The original AWS Jupyterhub Example Blueprint docs may be helpful.
Merge Strategy:
- Additive: New fields are added.
- Clobbering: Existing values, including lists, are overwritten.
example
Base Configuration (envs/shared/jupyterhub.yaml)
singleuser:
some_key: some_val
profileList:
- item1
- item2
Override Configuration (envs/$ENV/jupyterhub-overrides.yaml)
singleuser:
new_key: new_val
profileList:
- item3
Resulting Configuration
singleuser:
some_key: some_val
new_key: new_val
profileList:
- item3
- Open the GitHub OAuth App Wizard: GitHub settings -> Developer settings -> OAuth Apps. For dandihub, this is owned by a bot GitHub user account (e.g. dandibot).
- Create App:
Homepage URL
to the site root (e.g.,https://hub.dandiarchive.org
). Must be the same as jupyterhub_domain.Authorization callback URL
must be <jupyterhub_domain>/hub/oauth_callback.
Execute install script
./install.sh <env>
Timeouts and race conditions
Context Deadline Exceeded
: This just happens sometimes, usually resolved by rerunning the install script.
Key Management Service Duplicate Resource
This is usually caused by a problem with tfstate, it can't be immediately fixed because Amazon Key Management Service objects have a 7-day waiting period to delete.
The workaround is to change/add a name
var to the tfvars (ie jupyerhub-on-eks-2
)
Mark the existing KMS for deletion. You will need to assume the AWS IAM Role used to create it (ie JupyterhubProvisioningRole
)
Show config of current jupyterhub deployment
Warning: This is the fully templated jupyterhub. Be careful not to expose secrets.
helm get values jupyterhub -n jupyterhub
Route the Domain in Route 53
In Route 53 -> Hosted Zones -> <jupyterhub_domain> create an A
type Record that routes to an
Alias to Network Load Balancer
. Set the region and the EXTERNAL_IP of the service/proxy-public
Kubernetes object in the jupyterhub
namespace.
This will need to be redone each time the proxy-public
service is recreated (occurs during
./cleanup.sh
).
Changes to variables or the template configuration usually are updated idempotently by running
./install.sh <env>
without the need to cleanup prior.
Prior to cleanup ensure that kubectl and helm are using the appropriate kubeconfig
.
(<name>
is the value name
in terraform.tfvars
.)
aws eks --region us-east-2 update-kubeconfig --name <name-prefix>
Cleanup requires the same variables and is run ./cleanup.sh <env>
.
NOTE: Occasionally the Kubernetes namespace fails to delete.
WARNING: Sometimes AWS VPCs are left up due to an upstream Terraform race condition and must be deleted by hand (including hand-deleting each nested object).
terraform destroy -target=module.eks_data_addons.helm_release.jupyterhub -auto-approve
will
destroy all the jupyterhub assets, but will leave the EKS and VPC infrastructure intact.
Add the user/IAM to mapUsers
.
kubectl edit configMap -n kube-system aws-auth
apiVersion: v1
data:
mapAccounts: <snip>
mapRoles: <snip>
mapUsers: |
- groups:
- system:masters
userarn: arn:aws:iam::<acct_id>:user/<iam_username>
username: <iam_username>
kind: ConfigMap
metadata:
name: aws-auth
namespace: kube-system
These are the options for user-facing machines that run as a pod on the node, and they are configured in profileList
in dandihub.yaml
.
Each profile can have multiple user-facing profile_options
including images
.
These are the EKS machines that may run underneath one or more user-hub pods, and they are configured via Karpenter.
The node pools are configured in addons.tf
with karpenter-resources-*
objects.
The configuration for the machines that run the autoscaling and monitoring layer is eks_managed_node_groups
in main.tf
.
Kubernetes version is controlled via the terraform variable eks_cluster_version
, the default is
in versions.tf
, but each deployment can specify their own value in their tfvars
.
These objects are created by z2jh.
https://z2jh.jupyter.org/en/stable/
kubectl get all -n jupyterhub
Notable objects:
pod/hub-23490-393
: Jupyterhub server and culler podpod/jupyter-<github_username>
: User podpod/user-scheduler-5d8b9567-26x6j
: Creates user pods. There are two; one has been elected leader, with one backup.service/proxy-public
: LoadBalancer, External IP must be connected to DNS (Route 53)
pod/karpenter-75fc7784bf-cjddv
responds similarly to the cluster-autoscaler.
When Jupyterhub user pods are scheduled and sufficient Nodes are not available, Karpenter creates a NodeClaim and then interacts with AWS to spin up machines.
nodeclaims
: Create a node from one of the Karpenter Nodepools. (This is where spot/on-demand is configured for user-pods).
DANDI Hub provides persistent storage to each user, but over time the data stored can become expensive.
To run a job to gather disk usage per user, start by setting the configuring the aws
cli (make sure AWS_PROFILE
env var is set).
You will also need to set EC2_SSH_KEY
to the location of the PEM file for the dandihub-gh-actions keypair (see asmacdo).
Launch an ec2 instance with the appropriate tools and access:
./.github/scripts/launch-ec2.sh
NOTE: If this does not succeed, the security group may have changed, and if so the extra rules necessary for this instance will need to be put back into place.
On the SG for eks-dandihub-efs
add an inbound rule for NFS, pointing to the SG of the ec2 instance.
When the script completes, it will provide instructions to ssh into the instance.
Once logged into the instance, it is recommended to start a screen session.
screen -S create-file-index
Next, navigate to the EFS dir which contains each user homedir, and create a file index for each user.
cd /mnt/efs/home/
parallel -j 8 ~/scripts/create-file-index.py ::: *
Once finished, navigate to the output location of the file index script and generate the totals.
cd /home/ec2-user/hub-user-indexes
~/scripts/calculate-directory-stats.py
Logout of the ec2 instance, and pull the totals locally.
scp -i "$EC2_SSH_KEY" ec2-user@"$PUBLIC_IP":/home/ec2-user/hub-user-reports/all_users_total.json .
Finally, remove the ec2 instance.
./.github/scripts/cleanup-ec2.sh