Skip to content

Commit

Permalink
Merge pull request #2414 from minrk/ovh-terraform
Browse files Browse the repository at this point in the history
new OVH cluster
  • Loading branch information
minrk authored Nov 21, 2022
2 parents 3189394 + 03e64d4 commit a392225
Show file tree
Hide file tree
Showing 11 changed files with 489 additions and 7 deletions.
19 changes: 18 additions & 1 deletion .github/workflows/cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,14 @@ jobs:
helm_version: ""
experimental: false

- federation_member: ovh2
binder_url: https://ovh2.mybinder.org
hub_url: https://hub.ovh2.mybinder.org
# image-prefix should match ovh registry config in secrets/config/ovh.yaml
chartpress_args: "--push --image-prefix=2lmrrh8f.gra7.container-registry.ovh.net/mybinder-chart/mybinder-"
helm_version: ""
experimental: false

steps:
- name: "Stage 0: Update env vars based on job matrix arguments"
run: |
Expand Down Expand Up @@ -288,14 +296,23 @@ jobs:
GIT_CRYPT_KEY: ${{ secrets.GIT_CRYPT_KEY }}

# Action Repo: https://github.com/Azure/docker-login
- name: "Stage 3: Login to Docker regstry (OVH)"
- name: "Stage 3: Login to Docker registry (OVH)"
if: matrix.federation_member == 'ovh'
uses: azure/docker-login@v1
with:
login-server: 3i2li627.gra7.container-registry.ovh.net
username: ${{ secrets.DOCKER_USERNAME_OVH }}
password: ${{ secrets.DOCKER_PASSWORD_OVH }}

- name: "Stage 3: Login to Docker registry (OVH2)"
if: matrix.federation_member == 'ovh2'
uses: azure/docker-login@v1
with:
login-server: 2lmrrh8f.gra7.container-registry.ovh.net
username: ${{ secrets.DOCKER_USERNAME_OVH2 }}
# terraform output registry_chartpress_token
password: ${{ secrets.DOCKER_PASSWORD_OVH2 }}

- name: "Stage 3: Run chartpress to update values.yaml"
run: |
chartpress ${{ matrix.chartpress_args || '--skip-build' }}
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/test-helm-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ jobs:
k3s-channel: "v1.21"
- release: ovh
k3s-channel: "v1.20"
- release: ovh2
k3s-channel: "v1.24"
- release: turing
k3s-channel: "v1.21"

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ travis/crypt-key
env

.terraform
.terraform.lock.hcl
125 changes: 125 additions & 0 deletions config/ovh2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
projectName: ovh2

userNodeSelector: &userNodeSelector
mybinder.org/pool-type: users
coreNodeSelector: &coreNodeSelector
mybinder.org/pool-type: core

binderhub:
config:
BinderHub:
pod_quota: 10
hub_url: https://hub.ovh2.mybinder.org
badge_base_url: https://mybinder.org
build_node_selector: *userNodeSelector
sticky_builds: true
image_prefix: 2lmrrh8f.gra7.container-registry.ovh.net/mybinder-builds/r2d-g5b5b759
DockerRegistry:
# Docker Registry uses harbor
# ref: https://github.com/goharbor/harbor/wiki/Harbor-FAQs#api
token_url: "https://2lmrrh8f.gra7.container-registry.ovh.net/service/token?service=harbor-registry"

replicas: 1
nodeSelector: *coreNodeSelector

extraVolumes:
- name: secrets
secret:
secretName: events-archiver-secrets
extraVolumeMounts:
- name: secrets
mountPath: /secrets
readOnly: true
extraEnv:
GOOGLE_APPLICATION_CREDENTIALS: /secrets/service-account.json

ingress:
hosts:
- ovh2.mybinder.org

jupyterhub:
singleuser:
nodeSelector: *userNodeSelector
hub:
nodeSelector: *coreNodeSelector

proxy:
chp:
nodeSelector: *coreNodeSelector
resources:
requests:
cpu: "1"
limits:
cpu: "1"
ingress:
hosts:
- hub.ovh2.mybinder.org
tls:
- secretName: kubelego-tls-hub
hosts:
- hub.ovh2.mybinder.org
scheduling:
userPlaceholder:
replicas: 5
userScheduler:
nodeSelector: *coreNodeSelector

imageCleaner:
# Use 40GB as upper limit, size is given in bytes
imageGCThresholdHigh: 40e9
imageGCThresholdLow: 30e9
imageGCThresholdType: "absolute"

cryptnono:
enabled: false

grafana:
nodeSelector: *coreNodeSelector
ingress:
hosts:
- grafana.ovh2.mybinder.org
tls:
- hosts:
- grafana.ovh2.mybinder.org
secretName: kubelego-tls-grafana
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: prometheus
orgId: 1
type: prometheus
url: https://prometheus.ovh2.mybinder.org
access: direct
isDefault: true
editable: false
persistence:
storageClassName: csi-cinder-high-speed

prometheus:
server:
nodeSelector: *coreNodeSelector
persistentVolume:
size: 50Gi
retention: 30d
ingress:
hosts:
- prometheus.ovh2.mybinder.org
tls:
- hosts:
- prometheus.ovh2.mybinder.org
secretName: kubelego-tls-prometheus

ingress-nginx:
controller:
scope:
enabled: true
service:
loadBalancerIP: 162.19.17.37

static:
ingress:
hosts:
- static.ovh2.mybinder.org
tls:
secretName: kubelego-tls-static
39 changes: 35 additions & 4 deletions deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def setup_auth_ovh(release, cluster):
"""
print(f"Setup the OVH authentication for namespace {release}")

ovh_kubeconfig = os.path.join(ABSOLUTE_HERE, "secrets", "ovh-kubeconfig.yml")
ovh_kubeconfig = os.path.join(ABSOLUTE_HERE, "secrets", f"{release}-kubeconfig.yml")
os.environ["KUBECONFIG"] = ovh_kubeconfig
print(f"Current KUBECONFIG='{ovh_kubeconfig}'")
stdout = subprocess.check_output(["kubectl", "config", "use-context", cluster])
Expand Down Expand Up @@ -124,7 +124,7 @@ def update_networkbans(cluster):
# some members have special logic in ban.py,
# in which case they must be specified on the command-line
ban_command = [sys.executable, "secrets/ban.py"]
if cluster in {"turing-prod", "turing-staging", "turing", "ovh"}:
if cluster in {"turing-prod", "turing-staging", "turing", "ovh", "ovh2"}:
ban_command.append(cluster)

subprocess.check_call(ban_command)
Expand Down Expand Up @@ -245,13 +245,43 @@ def setup_certmanager():
subprocess.check_call(helm_upgrade)


def patch_coredns():
"""Patch coredns resource allocation
OVH2 coredns does not have sufficient memory by default after our ban patches
"""
print(BOLD + GREEN + "Patching coredns resources" + NC, flush=True)
subprocess.check_call(
[
"kubectl",
"set",
"resources",
"-n",
"kube-system",
"deployments/coredns",
"--limits",
"memory=250Mi",
"--requests",
"memory=200Mi",
]
)


def main():
# parse command line args
argparser = argparse.ArgumentParser()
argparser.add_argument(
"release",
help="Release to deploy",
choices=["staging", "prod", "ovh", "turing-prod", "turing-staging", "turing"],
choices=[
"staging",
"prod",
"ovh",
"ovh2",
"turing-prod",
"turing-staging",
"turing",
],
)
argparser.add_argument(
"--name",
Expand Down Expand Up @@ -302,8 +332,9 @@ def main():

# script is running on CI, proceed with auth and helm setup

if cluster == "ovh":
if cluster.startswith("ovh"):
setup_auth_ovh(args.release, cluster)
patch_coredns()
elif cluster in AZURE_RGs:
setup_auth_turing(cluster)
elif cluster in GCP_PROJECTS:
Expand Down
Binary file modified secrets/ban.py
Binary file not shown.
Binary file added secrets/config/ovh2.yaml
Binary file not shown.
Binary file added secrets/ovh2-kubeconfig.yml
Binary file not shown.
20 changes: 18 additions & 2 deletions terraform/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Terraform deployment info

Common configuration is in terraform/modules/mybinder
Common configuration for GKE is in terraform/modules/mybinder

most deployed things are in mybinder/resource.tf
variables (mostly things that should differ in staging/prod) in mybinder/variables.tf
Expand Down Expand Up @@ -49,11 +49,27 @@ terraform output -json private_keys | jq '.["events-archiver"]' | pbcopy

with key names: "events-archiver", "matomo", and "binderhub-builder" and paste them into the appropriate fields in `secrets/config/$deployment.yaml`.

### Notes
## Notes

- requesting previously-allocated static ip via loadBalancerIP did not work.
Had to manually mark LB IP as static via cloud console.

- sql admin API needed to be manually enabled [here](https://console.developers.google.com/apis/library/sqladmin.googleapis.com)
- matomo sql data was manually imported/exported via sql dashboard and gsutil in cloud console
- events archive history was manually migrated via `gsutil -m rsync` in cloud console

## OVH

The new OVH cluster is also deployed via terraform in the `ovh` directory.
This has a lot less to deploy than flagship GKE,
but deploys a Harbor (container image) registry as well.

### OVH Notes

- credentials are in `terraform/secrets/ovh-creds.py`
- token in credentials is owned by Min because OVH tokens are always owned by real OVH users, not per-project 'service account'.
The token only has permissions on the MyBinder cloud project, however.
- the only manual creation step was the s3 bucket and user for terraform state, the rest is created with terraform
- harbor registry on OVH is old, and this forces us to use an older
harbor _provider_.
Once OVH upgrades harbor to at least 2.2 (2.4 expected in 2022-12), we should be able to upgrade the harbor provider and robot accounts.
Loading

0 comments on commit a392225

Please sign in to comment.