Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new OVH cluster #2414

Merged
merged 14 commits into from
Nov 21, 2022
Merged
19 changes: 18 additions & 1 deletion .github/workflows/cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,14 @@ jobs:
helm_version: ""
experimental: false

- federation_member: ovh2
binder_url: https://ovh2.mybinder.org
hub_url: https://hub.ovh2.mybinder.org
# image-prefix should match ovh registry config in secrets/config/ovh.yaml
chartpress_args: "--push --image-prefix=2lmrrh8f.gra7.container-registry.ovh.net/mybinder-chart/mybinder-"
helm_version: ""
experimental: false

steps:
- name: "Stage 0: Update env vars based on job matrix arguments"
run: |
Expand Down Expand Up @@ -288,14 +296,23 @@ jobs:
GIT_CRYPT_KEY: ${{ secrets.GIT_CRYPT_KEY }}

# Action Repo: https://github.com/Azure/docker-login
- name: "Stage 3: Login to Docker regstry (OVH)"
- name: "Stage 3: Login to Docker registry (OVH)"
if: matrix.federation_member == 'ovh'
uses: azure/docker-login@v1
with:
login-server: 3i2li627.gra7.container-registry.ovh.net
username: ${{ secrets.DOCKER_USERNAME_OVH }}
password: ${{ secrets.DOCKER_PASSWORD_OVH }}

- name: "Stage 3: Login to Docker registry (OVH2)"
if: matrix.federation_member == 'ovh2'
uses: azure/docker-login@v1
with:
login-server: 2lmrrh8f.gra7.container-registry.ovh.net
username: ${{ secrets.DOCKER_USERNAME_OVH2 }}
# terraform output registry_chartpress_token
password: ${{ secrets.DOCKER_PASSWORD_OVH2 }}

- name: "Stage 3: Run chartpress to update values.yaml"
run: |
chartpress ${{ matrix.chartpress_args || '--skip-build' }}
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/test-helm-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ jobs:
k3s-channel: "v1.21"
- release: ovh
k3s-channel: "v1.20"
- release: ovh2
k3s-channel: "v1.23"
- release: turing
k3s-channel: "v1.21"

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ travis/crypt-key
env

.terraform
.terraform.lock.hcl
125 changes: 125 additions & 0 deletions config/ovh2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
projectName: ovh2

userNodeSelector: &userNodeSelector
mybinder.org/pool-type: users
coreNodeSelector: &coreNodeSelector
mybinder.org/pool-type: core

binderhub:
config:
BinderHub:
pod_quota: 10
hub_url: https://hub.ovh2.mybinder.org
badge_base_url: https://mybinder.org
build_node_selector: *userNodeSelector
sticky_builds: true
image_prefix: 2lmrrh8f.gra7.container-registry.ovh.net/mybinder-builds/r2d-g5b5b759
DockerRegistry:
# Docker Registry uses harbor
# ref: https://github.com/goharbor/harbor/wiki/Harbor-FAQs#api
token_url: "https://2lmrrh8f.gra7.container-registry.ovh.net/service/token?service=harbor-registry"

replicas: 1
nodeSelector: *coreNodeSelector

extraVolumes:
- name: secrets
secret:
secretName: events-archiver-secrets
extraVolumeMounts:
- name: secrets
mountPath: /secrets
readOnly: true
extraEnv:
GOOGLE_APPLICATION_CREDENTIALS: /secrets/service-account.json

ingress:
hosts:
- ovh2.mybinder.org

jupyterhub:
singleuser:
nodeSelector: *userNodeSelector
hub:
nodeSelector: *coreNodeSelector

proxy:
chp:
nodeSelector: *coreNodeSelector
resources:
requests:
cpu: "1"
limits:
cpu: "1"
ingress:
hosts:
- hub.ovh2.mybinder.org
tls:
- secretName: kubelego-tls-hub
hosts:
- hub.ovh2.mybinder.org
scheduling:
userPlaceholder:
replicas: 5
userScheduler:
nodeSelector: *coreNodeSelector

imageCleaner:
# Use 40GB as upper limit, size is given in bytes
imageGCThresholdHigh: 40e9
imageGCThresholdLow: 30e9
imageGCThresholdType: "absolute"

cryptnono:
enabled: false

grafana:
nodeSelector: *coreNodeSelector
ingress:
hosts:
- grafana.ovh2.mybinder.org
tls:
- hosts:
- grafana.ovh2.mybinder.org
secretName: kubelego-tls-grafana
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: prometheus
orgId: 1
type: prometheus
url: https://prometheus.ovh2.mybinder.org
access: direct
isDefault: true
editable: false
persistence:
storageClassName: csi-cinder-high-speed

prometheus:
server:
nodeSelector: *coreNodeSelector
persistentVolume:
size: 50Gi
retention: 30d
ingress:
hosts:
- prometheus.ovh2.mybinder.org
tls:
- hosts:
- prometheus.ovh2.mybinder.org
secretName: kubelego-tls-prometheus

ingress-nginx:
controller:
scope:
enabled: true
service:
loadBalancerIP: 162.19.17.37

static:
ingress:
hosts:
- static.ovh2.mybinder.org
tls:
secretName: kubelego-tls-static
16 changes: 12 additions & 4 deletions deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def setup_auth_ovh(release, cluster):
"""
print(f"Setup the OVH authentication for namespace {release}")

ovh_kubeconfig = os.path.join(ABSOLUTE_HERE, "secrets", "ovh-kubeconfig.yml")
ovh_kubeconfig = os.path.join(ABSOLUTE_HERE, "secrets", f"{release}-kubeconfig.yml")
os.environ["KUBECONFIG"] = ovh_kubeconfig
print(f"Current KUBECONFIG='{ovh_kubeconfig}'")
stdout = subprocess.check_output(["kubectl", "config", "use-context", cluster])
Expand Down Expand Up @@ -124,7 +124,7 @@ def update_networkbans(cluster):
# some members have special logic in ban.py,
# in which case they must be specified on the command-line
ban_command = [sys.executable, "secrets/ban.py"]
if cluster in {"turing-prod", "turing-staging", "turing", "ovh"}:
if cluster in {"turing-prod", "turing-staging", "turing", "ovh", "ovh2"}:
ban_command.append(cluster)

subprocess.check_call(ban_command)
Expand Down Expand Up @@ -251,7 +251,15 @@ def main():
argparser.add_argument(
"release",
help="Release to deploy",
choices=["staging", "prod", "ovh", "turing-prod", "turing-staging", "turing"],
choices=[
"staging",
"prod",
"ovh",
"ovh2",
"turing-prod",
"turing-staging",
"turing",
],
)
argparser.add_argument(
"--name",
Expand Down Expand Up @@ -302,7 +310,7 @@ def main():

# script is running on CI, proceed with auth and helm setup

if cluster == "ovh":
if cluster.startswith("ovh"):
setup_auth_ovh(args.release, cluster)
elif cluster in AZURE_RGs:
setup_auth_turing(cluster)
Expand Down
30 changes: 30 additions & 0 deletions mybinder/templates/image-pull-secret.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{{- define "_dockerconfigjson" -}}
{{ include "_dockerconfigjson.yaml" . | b64enc }}
{{- end }}
minrk marked this conversation as resolved.
Show resolved Hide resolved

{{- define "_dockerconfigjson.yaml" -}}
{{- with .Values.imagePullSecret -}}
{
"auths": {
{{ .registry | default "https://index.docker.io/v1/" | quote }}: {
"username": {{ .username | quote }},
"password": {{ .password | quote }},
{{- if .email }}
"email": {{ .email | quote }},
{{- end }}
"auth": {{ (print .username ":" .password) | b64enc | quote }}
}
}
}
{{- end }}
{{- end }}
minrk marked this conversation as resolved.
Show resolved Hide resolved

{{- if .Values.imagePullSecret.create -}}
kind: Secret
apiVersion: v1
metadata:
name: {{ .Values.imagePullSecret.name }}
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: {{ include "_dockerconfigjson" . }}
{{- end }}
5 changes: 5 additions & 0 deletions mybinder/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@ cryptnono:
operator: Equal
value: user

# allow creating a pull secret
imagePullSecret:
create: false
name: "chart-secret"

imagePullSecrets:

tags: {}
Expand Down
Binary file modified secrets/ban.py
Binary file not shown.
Binary file added secrets/config/ovh2.yaml
Binary file not shown.
Binary file added secrets/ovh2-kubeconfig.yml
Binary file not shown.
20 changes: 18 additions & 2 deletions terraform/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Terraform deployment info

Common configuration is in terraform/modules/mybinder
Common configuration for GKE is in terraform/modules/mybinder

most deployed things are in mybinder/resource.tf
variables (mostly things that should differ in staging/prod) in mybinder/variables.tf
Expand Down Expand Up @@ -49,11 +49,27 @@ terraform output -json private_keys | jq '.["events-archiver"]' | pbcopy

with key names: "events-archiver", "matomo", and "binderhub-builder" and paste them into the appropriate fields in `secrets/config/$deployment.yaml`.

### Notes
## Notes

- requesting previously-allocated static ip via loadBalancerIP did not work.
Had to manually mark LB IP as static via cloud console.

- sql admin API needed to be manually enabled [here](https://console.developers.google.com/apis/library/sqladmin.googleapis.com)
- matomo sql data was manually imported/exported via sql dashboard and gsutil in cloud console
- events archive history was manually migrated via `gsutil -m rsync` in cloud console

## OVH

The new OVH cluster is also deployed via terraform in the `ovh` directory.
This has a lot less to deploy than flagship GKE,
but deploys a Harbor (container image) registry as well.

### OVH Notes

- credentials are in `terraform/secrets/ovh-creds.py`
- token in credentials is owned by Min because OVH tokens are always owned by real OVH users, not per-project 'service account'.
The token only has permissions on the MyBinder cloud project, however.
- the only manual creation step was the s3 bucket and user for terraform state, the rest is created with terraform
- harbor registry on OVH is old, and this forces us to use an older
harbor _provider_.
Once OVH upgrades harbor to at least 2.2 (2.4 expected in 2022-12), we should be able to upgrade the harbor provider and robot accounts.
Loading