Skip to content

Commit

Permalink
Merge pull request #2539 from consideRatio/pr/awi-ciroh-event
Browse files Browse the repository at this point in the history
awi-ciroh: last minute fixes for event 16-18 May - node sharing adopted to avoid PD quota, legacy profile list retained
  • Loading branch information
consideRatio authored May 16, 2023
2 parents 6571c1f + 98e56e9 commit 637977f
Show file tree
Hide file tree
Showing 2 changed files with 49 additions and 6 deletions.
37 changes: 31 additions & 6 deletions config/clusters/awi-ciroh/common.values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,42 +54,67 @@ basehub:
singleuser:
image:
# Image build repo: https://github.com/2i2c-org/awi-ciroh-image
#
# NOTE: The configurator is used in this cluster, so this is stale
# configuration.
#
name: "quay.io/2i2c/awi-ciroh-image"
tag: "63ecd92f8d84"
tag: "584293e50d4c"
profileList:
# The mem-guarantees are here so k8s doesn't schedule other pods
# on these nodes. They need to be just under total allocatable
# RAM on a node, not total node capacity. Values calculated using
# https://learnk8s.io/kubernetes-instance-calculator
#
# FIXME: These are changed to a temporary node sharing setup based on
# the legacy choices to help us pre-warm capacity independent on
# the choices users end up making and avoiding running into
# persistent disk quotas.
#
# Change PR: https://github.com/2i2c-org/infrastructure/pull/2539
# Related event: https://github.com/2i2c-org/infrastructure/issues/2520
#
# This is an interim setup, trying to balance the experience of
# the previous 1:1 user:node setup with a node sharing setup. It
# is not meant to be retained long term!
#
- display_name: "Small"
description: 5GB RAM, 2 CPUs
default: true
kubespawner_override:
mem_limit: 7G
mem_guarantee: 4.5G
mem_guarantee: 5G
cpu_limit: 2
cpu_guarantee: 0.938
node_selector:
node.kubernetes.io/instance-type: n1-standard-2
node.kubernetes.io/instance-type: n2-highmem-16
- display_name: Medium
description: 11GB RAM, 4 CPUs
kubespawner_override:
mem_limit: 15G
mem_guarantee: 11G
cpu_limit: 4
cpu_guarantee: 1.875
node_selector:
node.kubernetes.io/instance-type: n1-standard-4
node.kubernetes.io/instance-type: n2-highmem-16
- display_name: Large
description: 24GB RAM, 8 CPUs
kubespawner_override:
mem_limit: 30G
mem_guarantee: 24G
cpu_limit: 8
cpu_guarantee: 3.75
node_selector:
node.kubernetes.io/instance-type: n1-standard-8
node.kubernetes.io/instance-type: n2-highmem-16
- display_name: Huge
description: 52GB RAM, 16 CPUs
kubespawner_override:
mem_limit: 60G
mem_guarantee: 52G
cpu_limit: 16
cpu_guarantee: 7.5
node_selector:
node.kubernetes.io/instance-type: n1-standard-16
node.kubernetes.io/instance-type: n2-highmem-16
dask-gateway:
gateway:
backend:
Expand Down
18 changes: 18 additions & 0 deletions terraform/gcp/projects/awi-ciroh.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,24 @@ notebook_nodes = {
count: 0
}
},
# added stressfully before an event where we ran out of ssd quota, see
# https://github.com/2i2c-org/infrastructure/pull/2539 and the linked
# event https://github.com/2i2c-org/infrastructure/issues/2520.
#
# FIXME: make this cluster have a node sharing setup like in the
# basehub/daskhub template.
#
"highmem-medium" : {
min : 10,
max : 100,
machine_type : "n2-highmem-16",
labels: {},
gpu: {
enabled: false,
type: "",
count: 0
}
},
}

dask_nodes = {
Expand Down

0 comments on commit 637977f

Please sign in to comment.