Skip to content
This repository has been archived by the owner on Sep 23, 2024. It is now read-only.

HTTP timeout with repo2data #19

Open
ltetrel opened this issue Oct 23, 2019 · 15 comments
Open

HTTP timeout with repo2data #19

ltetrel opened this issue Oct 23, 2019 · 15 comments
Assignees

Comments

@ltetrel
Copy link
Collaborator

ltetrel commented Oct 23, 2019

When data takes to long to download, we got a http timout from the hub pod. We can of course temporally increase the timeout, but this has the advantage to impact restarting time if there is another thing wrong with the environment.

@ltetrel
Copy link
Collaborator Author

ltetrel commented Oct 23, 2019

http timeout is controlled here : https://jupyterhub.readthedocs.io/en/stable/api/spawner.html#spawner

@ltetrel
Copy link
Collaborator Author

ltetrel commented Oct 23, 2019

Using an init container could help to encapsulate repo2data into a separate process (container).
https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
Now it is running in the hub container, which make the data download part of the process to run a pod (not ideal).

@ltetrel
Copy link
Collaborator Author

ltetrel commented Oct 29, 2019

Maybe it is possible to run an init container for every pod that are building ?
component=binderhub-build
Maybe with pod preset ?
This way repo2data would be called just once, when the pod is building (and not in the hub like now, repo2data is called every time a pod is created). + no http timeout since it is running during building

@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 11, 2020

The issue is that the pod preset should have the information of the repo being built (to clone the data_requirement file and pull the data with repo2data)

@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 11, 2020

Hopefully the pod used to build the user binder environment has the repo in his annotation, which could be then used as an input for the pod preset (but how?):

Labels:       component=binderhub-build
              name=build-ltetrel-2dbinder-2dtuto-f9e17d-dc5e69
Annotations:  binder-repo: https://github.com/ltetrel/binder-tuto

@agahkarakuzu
Copy link
Member

There is a way to fetch values from a container to store them as env vars. If we store label/name with the information of the repo being built in the same order, it may do the trick maybe?

@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 11, 2020

Yes it could work

@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 11, 2020

The issue is how to inject a container (which would run repo2data) to every build pod.
If init-container, we would need to specify it when creating the build pod (what we don't have controll on).
That is why I fought about podPreset to insert an init container before every build.

@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 11, 2020

But podpreset does not seem to be used for this type of case: kubernetes/kubernetes#43874
Will check https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/..

@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 11, 2020

If we would have controll over binderhub code, we could add it here, maybe it is a room to create a PR (that would fit a general case for them)..

@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 11, 2020

Or maybe it should be possible to change the build_image https://github.com/jupyterhub/binderhub/blob/b6446b12b30f741d9e82b7aec1498ede4776cd79/binderhub/app.py#L383

@agahkarakuzu
Copy link
Member

OK, I thought you were referring to injecting application information into a pod. But you would like to inject a (repo2data) container to a running (build) pod or on pod creation?

@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 11, 2020

Yep, and for that I need the repository information to pull the data_requirement file inside repo2data docker container.

@ltetrel ltetrel self-assigned this Mar 13, 2020
@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 13, 2020

Example of rendered config for a binder build pod:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    binder-repo: https://github.com/ltetrel/repo2data-caching-s3
  creationTimestamp: "2020-03-13T16:38:57Z"
  labels:
    component: binderhub-build
    name: build-ltetrel-2drepo2data-2dcaching-2ds3-7c151e-a3305a
  name: build-ltetrel-2drepo2data-2dcaching-2ds3-7c151e-a3305a
  namespace: binderhub
  resourceVersion: "1029140"
  selfLink: /api/v1/namespaces/binderhub/pods/build-ltetrel-2drepo2data-2dcaching-2ds3-7c151e-a3305a
  uid: 55d4b73d-ba25-4af5-961a-b13d9d36f95b
spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - podAffinityTerm:
          labelSelector:
            matchLabels:
              component: binderhub-build
          topologyKey: kubernetes.io/hostname
        weight: 100
  containers:
  - args:
    - jupyter-repo2docker
    - --ref
    - a3305a93929c977f9d83e77e05cbb6a370d0284b
    - --image
    - binder-registry.conp.cloud/binder-dev.conp.cloud/binder-ltetrel-2drepo2data-2dcaching-2ds3-7c151e:a3305a93929c977f9d83e77e05cbb6a370d0284b
    - --no-clean
    - --no-run
    - --json-logs
    - --user-name
    - jovyan
    - --user-id
    - "1000"
    - --push
    - https://github.com/ltetrel/repo2data-caching-s3
    image: jupyter/repo2docker:0.10.0
    imagePullPolicy: IfNotPresent
    name: builder
    resources:
      limits:
        memory: "0"
      requests:
        memory: "0"
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/docker.sock
      name: docker-socket
    - mountPath: /root/.docker
      name: docker-push-secret
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-5gnt9
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: neurolibre-dev-node1
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: hub.jupyter.org/dedicated
    operator: Equal
    value: user
  - effect: NoSchedule
    key: hub.jupyter.org_dedicated
    operator: Equal
    value: user
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - hostPath:
      path: /var/run/docker.sock
      type: Socket
    name: docker-socket
  - name: docker-push-secret
    secret:
      defaultMode: 420
      secretName: binder-push-secret
  - name: default-token-5gnt9
    secret:
      defaultMode: 420
      secretName: default-token-5gnt9
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-03-13T16:38:57Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-03-13T16:39:01Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-03-13T16:39:01Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-03-13T16:38:57Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://5f42c13b1b833ac645bfc648293a057cbfb51bb3cab21e7f2e149943442bf0db
    image: jupyter/repo2docker:0.10.0
    imageID: docker-pullable://jupyter/repo2docker@sha256:b8855ce9f6f9ba3a98369331231f6c0d01badec68109f4b13b2308f5d15698f4
    lastState: {}
    name: builder
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-03-13T16:39:00Z"
  hostIP: 192.168.73.23
  phase: Running
  podIP: 10.244.1.12
  podIPs:
  - ip: 10.244.1.12
  qosClass: BestEffort
  startTime: "2020-03-13T16:38:57Z"

@ltetrel
Copy link
Collaborator Author

ltetrel commented Mar 27, 2020

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants