Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New] Opensci cluster and hub #3733

Merged
merged 23 commits into from
Mar 5, 2024

Conversation

GeorgianaElena
Copy link
Member

Fixes #3667

@consideRatio consideRatio changed the title [New] Openscli cluster and hub [New] Opensci cluster and hub Feb 21, 2024
Copy link

github-actions bot commented Feb 22, 2024

Merging this PR will trigger the following deployment actions.

Support and Staging deployments

Cloud Provider Cluster Name Upgrade Support? Reason for Support Redeploy Upgrade Staging? Reason for Staging Redeploy
gcp linked-earth Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws earthscope Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp cloudbank Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws nasa-veda Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws victor Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp leap Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp catalystproject-latam Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp 2i2c Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws nasa-ghg Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp 2i2c-uk Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws nasa-esdis Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws nasa-cryo Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws openscapes Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws opensci Yes Support helm chart has been modified Yes Following prod hubs require redeploy: sciencecore
aws ubc-eoas Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws jupyter-meets-the-earth Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp hhmi Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws gridsst Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws 2i2c-aws-us Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp qcl Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp meom-ige Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp pangeo-hubs Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws catalystproject-africa Yes Support helm chart has been modified Yes Core infrastructure has been modified
gcp awi-ciroh Yes Support helm chart has been modified Yes Core infrastructure has been modified
aws smithsonian Yes Support helm chart has been modified Yes Core infrastructure has been modified
kubeconfig utoronto Yes Support helm chart has been modified Yes Core infrastructure has been modified

Production deployments

Cloud Provider Cluster Name Hub Name Reason for Redeploy
gcp linked-earth prod Core infrastructure has been modified
aws earthscope prod Core infrastructure has been modified
gcp cloudbank bcc Core infrastructure has been modified
gcp cloudbank ccsf Core infrastructure has been modified
gcp cloudbank csm Core infrastructure has been modified
gcp cloudbank dvc Core infrastructure has been modified
gcp cloudbank elcamino Core infrastructure has been modified
gcp cloudbank evc Core infrastructure has been modified
gcp cloudbank glendale Core infrastructure has been modified
gcp cloudbank howard Core infrastructure has been modified
gcp cloudbank miracosta Core infrastructure has been modified
gcp cloudbank skyline Core infrastructure has been modified
gcp cloudbank demo Core infrastructure has been modified
gcp cloudbank fresno Core infrastructure has been modified
gcp cloudbank humboldt Core infrastructure has been modified
gcp cloudbank laney Core infrastructure has been modified
gcp cloudbank sbcc Core infrastructure has been modified
gcp cloudbank sbcc-dev Core infrastructure has been modified
gcp cloudbank elac Core infrastructure has been modified
gcp cloudbank lacc Core infrastructure has been modified
gcp cloudbank lamission Core infrastructure has been modified
gcp cloudbank mills Core infrastructure has been modified
gcp cloudbank mission Core infrastructure has been modified
gcp cloudbank norco Core infrastructure has been modified
gcp cloudbank palomar Core infrastructure has been modified
gcp cloudbank pasadena Core infrastructure has been modified
gcp cloudbank sjcc Core infrastructure has been modified
gcp cloudbank sacramento Core infrastructure has been modified
gcp cloudbank srjc Core infrastructure has been modified
gcp cloudbank saddleback Core infrastructure has been modified
gcp cloudbank santiago Core infrastructure has been modified
gcp cloudbank sjsu Core infrastructure has been modified
gcp cloudbank sierra Core infrastructure has been modified
gcp cloudbank tuskegee Core infrastructure has been modified
gcp cloudbank wlac Core infrastructure has been modified
gcp cloudbank csulb Core infrastructure has been modified
gcp cloudbank csum Core infrastructure has been modified
aws nasa-veda prod Core infrastructure has been modified
aws victor prod Core infrastructure has been modified
gcp leap prod Core infrastructure has been modified
gcp catalystproject-latam unitefa-conicet Core infrastructure has been modified
gcp catalystproject-latam cicada Core infrastructure has been modified
gcp catalystproject-latam gita Core infrastructure has been modified
gcp 2i2c imagebuilding-demo Core infrastructure has been modified
gcp 2i2c demo Core infrastructure has been modified
gcp 2i2c ohw Core infrastructure has been modified
gcp 2i2c aup Core infrastructure has been modified
gcp 2i2c temple Core infrastructure has been modified
gcp 2i2c ucmerced Core infrastructure has been modified
gcp 2i2c climatematch Core infrastructure has been modified
gcp 2i2c mtu Core infrastructure has been modified
gcp 2i2c tufts Core infrastructure has been modified
aws nasa-ghg prod Core infrastructure has been modified
gcp 2i2c-uk lis Core infrastructure has been modified
aws nasa-esdis prod Core infrastructure has been modified
aws nasa-cryo prod Core infrastructure has been modified
aws openscapes prod Core infrastructure has been modified
aws opensci sciencecore Core infrastructure has been modified
aws ubc-eoas prod Core infrastructure has been modified
aws jupyter-meets-the-earth prod Core infrastructure has been modified
gcp hhmi prod Core infrastructure has been modified
gcp hhmi spyglass Core infrastructure has been modified
aws gridsst prod Core infrastructure has been modified
aws 2i2c-aws-us showcase Core infrastructure has been modified
aws 2i2c-aws-us ncar-cisl Core infrastructure has been modified
aws 2i2c-aws-us go-bgc Core infrastructure has been modified
aws 2i2c-aws-us itcoocean Core infrastructure has been modified
aws 2i2c-aws-us cosmicds Core infrastructure has been modified
gcp qcl prod Core infrastructure has been modified
gcp meom-ige prod Core infrastructure has been modified
gcp pangeo-hubs prod Core infrastructure has been modified
gcp pangeo-hubs coessing Core infrastructure has been modified
aws catalystproject-africa nm-aist Core infrastructure has been modified
aws catalystproject-africa must Core infrastructure has been modified
aws catalystproject-africa uvri Core infrastructure has been modified
gcp awi-ciroh prod Core infrastructure has been modified
aws smithsonian prod Core infrastructure has been modified
kubeconfig utoronto prod Core infrastructure has been modified
kubeconfig utoronto r-prod Core infrastructure has been modified

@GeorgianaElena
Copy link
Member Author

Ok, so there's a hub running now at https://sciencecore.opensci.2i2c.cloud. It can build and push imageas the the registry, but cannot pull and I don't understand what's missing :/

The error I'm seeing in the hub is a 403:

2024-02-23T12:05:28Z [Warning] Failed to pull image "us-central1-docker.pkg.dev/two-eye-two-see/binder-staging-registry/opensci-2i2c-2dorg-2drocker-2dwith-2dnbgitpuller-4d2006:287ea05b280937fd00885fee87a9ab003e86a53d": failed to pull and unpack image "us-central1-docker.pkg.dev/two-eye-two-see/binder-staging-registry/opensci-2i2c-2dorg-2drocker-2dwith-2dnbgitpuller-4d2006:287ea05b280937fd00885fee87a9ab003e86a53d": failed to resolve reference "us-central1-docker.pkg.dev/two-eye-two-see/binder-staging-registry/opensci-2i2c-2dorg-2drocker-2dwith-2dnbgitpuller-4d2006:287ea05b280937fd00885fee87a9ab003e86a53d": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://us-central1-docker.pkg.dev/v2/token?scope=repository%3Atwo-eye-two-see%2Fbinder-staging-registry%2Fopensci-2i2c-2dorg-2drocker-2dwith-2dnbgitpuller-4d2006%3Apull&service=us-central1-docker.pkg.dev: 403 Forbidden

And the error in the binderhub pod is a 401:

Failed to get image manifest for us-central1-docker.pkg.dev/two-eye-two-see/binder-staging-registry/opensci-2i2c-2dorg-2drocker-2dwith-2dnbgitpuller-4d2006:287ea05b280937fd00885fee87a9ab003e86a53d
    Traceback (most recent call last):
      File "/usr/local/lib/python3.11/site-packages/binderhub/builder.py", line 407, in get
        image_manifest = await self.registry.get_image_manifest(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/binderhub/registry.py", line 283, in get_image_manifest
        token = await self._get_token(
                ^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.11/site-packages/binderhub/registry.py", line 241, in _get_token
        auth_resp = await client.fetch(auth_req)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    tornado.httpclient.HTTPClientError: HTTP 401: Unauthorized

@GeorgianaElena GeorgianaElena marked this pull request as ready for review February 23, 2024 12:19
@GeorgianaElena GeorgianaElena requested a review from a team as a code owner February 23, 2024 12:19
@GeorgianaElena
Copy link
Member Author

GeorgianaElena commented Feb 23, 2024

Ah, there's also this in the binderhub pod:

[W 240223 11:18:49 registry:112] No docker config at /.docker/config.json
[W 240223 11:18:49 registry:154] No username for docker registry at https://index.docker.io/v1
[W 240223 11:18:49 registry:179] No password for docker registry at https://index.docker.io/v1

Which is similar to #2699 (comment)

Update1: I believe this is the relevant bit ⬆️

Update2: It's clear that the docker client cannot find the k8s secret with the proper creds. But why?

Update3: The k8s secrets with the registry creds get mounted and loaded correctly by the binderhub builder pods.

Update4: Aha, so the issue is with the notebook pod not being able to pull from google artifact registry. This issue is only on AWS nodes 🤔

Update 5: I've managed to switch to the quay registry, but pulling still doesn;'t work, because by default, the images get pushed as private (even though our quay plan doesn't allow it 🙄 )

@GeorgianaElena
Copy link
Member Author

I managed to get a workaround on the situation above. @jmunroe, you should now be able to test and use the hub at https://sciencecore.opensci.2i2c.cloud

So, this hub now builds and pushes images to our quay.io account under the prefix 2i2c/opensci-sciencecore/ (https://quay.io/organization/2i2c?tab=repos). However, there's no way to make the images being pushed to quay be public and somehow the push defaults to private images.

So, the notebook pods need credentials to be able to pull these images in order to start the server. A temporary fix to this is to put the robot's account's credentials under jupyterhub.imagePullSecret. Note: this robot's account creds are also mounted as a secret into the binderhub-service build pods.

@consideRatio
Copy link
Contributor

consideRatio commented Feb 27, 2024

@GeorgianaElena nice work it seems like we can get an image built and that we then can launch it.

I figure we shouldn't use the 2i2c organization on quay.io long term, but that was the workaround you referred to right?

@GeorgianaElena
Copy link
Member Author

GeorgianaElena commented Feb 27, 2024

I figure we shouldn't use the 2i2c organization on quay.io long term, but that was the workaround you referred to right?

@consideRatio, yes! This and the fact that the images being pushed to quay default to being private images, so we need imagePullSecret set so that the user pods can pull the image in the launch step.

I couldn't find a way to signal to quay to default any new image to be a public image :( I believe there is a visibility query string that we could set as public when sending the docker push request, but that wasn't super straightforward to solve.

@jmunroe
Copy link
Contributor

jmunroe commented Feb 27, 2024

Thank you @GeorgianaElena and team! I'll try to give this new hub a try to today. If it works for me (I can log in, build an image from a repo, and launch that image I'll call that a win!) I'll be asking others within ScienceCore to try as well.

@jmunroe
Copy link
Contributor

jmunroe commented Feb 29, 2024

People have been added to the hub and hopefully we'll get some 'real world' testing from other ScienceCore teams over the next few weeks.

The only 'issue' encountered is the domain name: looks like sciencecore.opensci.2i2c.cloud is blocked through NASA firewall or security settings. Speculation is that *.2i2c.cloud may be the issue.

I'll be working with the NASA folks to establish a preferred domain to use but we shouldn't be blocked waiting for a decision.

I think this PR should be merged and say this sciencecore hub has been deployed! We can create new issues/support issues for future iterations.

Copy link
Member

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incredible work, @GeorgianaElena. I've a couple suggestions, but otherwise this is good.

BinderHub:
base_url: /services/binder
use_registry: true
image_prefix: quay.io/2i2c/opensci-sciencecore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long term we should move to ECR, but in the meantime, I would suggest we create a different quay.io organization (perhaps 2i2c-opensci-sciencecore) for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought ECR is not yet supported by binderhub jupyterhub/binderhub#705. Ah, you mean, once its supported, we should use that instead of quay right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeorgianaElena ah, I had thought it is now supported given that we have a mybinder.org federation member running on AWS (thanks to Simon's work). I've pinged him in the ticket to ask.

@GeorgianaElena
Copy link
Member Author

Thanks for the feedback @yuvipanda. I believe I've addressed all the comments 🚀

Copy link
Member

@yuvipanda yuvipanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgianaElena! This looks good to me!

Copy link
Contributor

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @GeorgianaElena for working this and pulling in some cluster unspecific improvements!

config/clusters/opensci/cluster.yaml Outdated Show resolved Hide resolved
config/clusters/opensci/common.values.yaml Show resolved Hide resolved
@GeorgianaElena
Copy link
Member Author

Thank you @yuvipanda and @consideRatio! Merging 🚀

@GeorgianaElena GeorgianaElena merged commit 55476e7 into 2i2c-org:master Mar 5, 2024
33 of 34 checks passed
@GeorgianaElena GeorgianaElena deleted the opensciclustre branch March 5, 2024 10:26
Copy link

github-actions bot commented Mar 5, 2024

🎉🎉🎉🎉

Monitor the deployment of the hubs here 👉 https://github.com/2i2c-org/infrastructure/actions/runs/8154719390

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done 🎉
Development

Successfully merging this pull request may close these issues.

[Request deployment] New Hub: sciencecore.opensci.2i2c.cloud
4 participants