Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS ECR support (round two) #1055

Open
wants to merge 55 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
7c74f8a
add registry_class config value
chicocvenancio Aug 8, 2019
81f92ca
add boto3 requirement
chicocvenancio Aug 10, 2019
7d6f775
add AWSElasticContainerRegistry class
chicocvenancio Aug 27, 2019
b0b438c
rbac.yaml: allow manipulation of secrets
chicocvenancio Aug 27, 2019
6cb5b6a
doc-requirements: add boto3
chicocvenancio Aug 27, 2019
884af25
Merge branch 'master' into aws_ecr
chicocvenancio Nov 8, 2019
e25e597
Fix syntax
chicocvenancio Nov 8, 2019
56dd9cf
Merge commit 'e25e59763ef2e33fb829e2feb36329721b28e015' into aws-ecr-…
ivan-gomes Jan 20, 2020
bd961a8
Revert RBAC modification and remove handling in AWSElasticContainerRe…
ivan-gomes Jan 21, 2020
af56e93
Add password and url fetching for ECR
ivan-gomes Jan 21, 2020
296772b
Add AWS ECR documentation
ivan-gomes Jan 21, 2020
ade3a8b
Fix wording in AWS ECR documentation
ivan-gomes Jan 21, 2020
c1c37dd
Add ECR host to `image_prefix` in documentation
ivan-gomes Jan 21, 2020
b9ecea4
Re-add binder-push-secret patching. Add check for correct auth token.
ivan-gomes Jan 21, 2020
f9fd86c
Remove duplicate GetAuthorizationToken in AWS ECR doc
ivan-gomes Jan 30, 2020
a5edcd3
Lazily initialize kube_client in AWSElasticContainerRegistry to preve…
ivan-gomes Feb 9, 2020
a9aa576
Update description of `registry_class`
ivan-gomes Feb 13, 2020
468bffc
Merge remote-tracking branch 'upstream/master' into aws-ecr-support
ivan-gomes May 13, 2020
1047f9e
Rename `registry_class` to `docker_registry_class` for clarity
ivan-gomes May 13, 2020
88e4914
Move boto3 and kubernetes calls to own threadpool
ivan-gomes Jun 6, 2020
8309e97
Merge remote-tracking branch 'upstream/master' into aws-ecr-support
ivan-gomes Jun 6, 2020
c6939fe
Merge remote-tracking branch 'upstream/master' into aws-ecr-support
ivan-gomes Jul 31, 2020
019a9fe
Restore newline lost during merge
ivan-gomes Jul 31, 2020
00048e3
Merge remote-tracking branch 'upstream/master' into aws-ecr-support
ivan-gomes Sep 17, 2020
a74d814
general word-smithing for clarity
TomasBeuzen Sep 18, 2020
dddecf8
remove stray comma that AWS doesn't like in the policy
TomasBeuzen Sep 18, 2020
a729986
one more small clarification
TomasBeuzen Sep 18, 2020
95a2f96
Merge pull request #2 from TomasBeuzen/aws-ecr-support-TB
ivan-gomes Sep 19, 2020
9a4ad32
add boto3 requirement
chicocvenancio Aug 10, 2019
dfcb921
add AWSElasticContainerRegistry class
chicocvenancio Aug 27, 2019
a2a670a
rbac.yaml: allow manipulation of secrets
chicocvenancio Aug 27, 2019
25faed2
Revert RBAC modification and remove handling in AWSElasticContainerRe…
ivan-gomes Jan 21, 2020
60c8ed4
Add password and url fetching for ECR
ivan-gomes Jan 21, 2020
6e6a61c
Add AWS ECR documentation
ivan-gomes Jan 21, 2020
b9877a7
Fix wording in AWS ECR documentation
ivan-gomes Jan 21, 2020
a5f21e8
Add ECR host to `image_prefix` in documentation
ivan-gomes Jan 21, 2020
54c1b64
Re-add binder-push-secret patching. Add check for correct auth token.
ivan-gomes Jan 21, 2020
6266cdd
Remove duplicate GetAuthorizationToken in AWS ECR doc
ivan-gomes Jan 30, 2020
dc676c4
Lazily initialize kube_client in AWSElasticContainerRegistry to preve…
ivan-gomes Feb 9, 2020
5a4894a
Rename `registry_class` to `docker_registry_class` for clarity
ivan-gomes May 13, 2020
eeb7661
Move boto3 and kubernetes calls to own threadpool
ivan-gomes Jun 6, 2020
ad97b5a
general word-smithing for clarity
TomasBeuzen Sep 18, 2020
e7624d9
remove stray comma that AWS doesn't like in the policy
TomasBeuzen Sep 18, 2020
378f539
one more small clarification
TomasBeuzen Sep 18, 2020
762443e
Fix rebase mishap
thomas-bc Aug 11, 2022
40ddeeb
update registry_class documentation for ECR
thomas-bc Aug 11, 2022
42a28bf
moving optional boto3 import inside class
thomas-bc Aug 11, 2022
7cf26bf
_patch_docker_config_secret docs and robustness
thomas-bc Aug 12, 2022
d9a983f
make RBAC increase conditional
thomas-bc Aug 12, 2022
482cdb9
Merge branch 'jupyterhub:master' into aws-ecr-support
thomas-bc Aug 16, 2022
591be68
rename RBAC increase to rbac.patchSecrets
thomas-bc Aug 16, 2022
c2b3f6e
Merge branch 'aws-ecr-support' of https://github.com/thomas-bc/binder…
thomas-bc Aug 16, 2022
56f9e81
Merge branch 'aws-ecr-support' of https://github.com/ivan-gomes/binde…
thomas-bc Aug 16, 2022
6fbc789
Merge pull request #4 from thomas-bc/aws-ecr-support
ivan-gomes Aug 16, 2022
8a25e45
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 16, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 90 additions & 1 deletion binderhub/registry.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
"""
Interaction with the Docker Registry
"""
import asyncio
import base64
import json
import os
from concurrent.futures import ThreadPoolExecutor
from urllib.parse import urlparse

import kubernetes.client
import kubernetes.config
from tornado import httpclient
from tornado.httputil import url_concat
from traitlets import Dict, Unicode, default
from traitlets import Any, Dict, Integer, Unicode, default
from traitlets.config import LoggingConfigurable

DEFAULT_DOCKER_REGISTRY_URL = "https://registry-1.docker.io"
Expand Down Expand Up @@ -233,6 +237,91 @@ async def get_image_manifest(self, image, tag):
return json.loads(resp.body.decode("utf-8"))


class AWSElasticContainerRegistry(DockerRegistry):
import boto3

aws_region = Unicode(
config=True,
help="""
AWS region for ECR service
""",
)

ecr_client = Any()

@default("ecr_client")
def _get_ecr_client(self):
return boto3.client("ecr", region_name=self.aws_region)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is boto3 async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not officially. There are third-party wrappers like aioboto3 - https://github.com/terrycain/aioboto3 - but we may not want to go there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinderHub is written using tornado so we can't make calls over the network with a library which blocks. If we do then all of the BinderHub process will block while that network request is happening. We either need to use a threadpool to execute the boto3 calls or use a async library that you can await.

Depending on the complexity of the requests you are making another option would be to implement the HTTP call yourself using the AsyncHTTPClient class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. There is really only one request that needs to be made currently and it's relatively simple to do ourselves. Unless you see a value in integrating something like aioboto3 we can implement the request ourselves using AsyncHTTPClient. Drawing inspiration from elsewhere, it seems this is the route that FargateSpawner for JupyterHub went as well when making AWS calls - ref: https://github.com/uktrade/fargatespawner/blob/c614a54ffd80d0fb8886d1ef9e8de2c938de7759/fargatespawner/fargatespawner.py#L322-L346.

If you approve of this path, I will start work on implementing and testing it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After biting into AWS's request signing process - ref: https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html - I realized heading down the request reimplementation may not be the best approach. Additionally, the kubernetes call is also not async, so I switched to using a threadpool as you suggested. It has now been implemented and tested.


username = "AWS"

executor_threads = Integer(
5,
config=True,
help="""The number of threads to use for blocking calls

Should generaly be a small number because we don't
care about high concurrency here, just not blocking the webserver.
This executor is not used for long-running tasks (e.g. builds).
""",
)

executor = Any()

@default("executor")
def _get_executor(self):
return ThreadPoolExecutor(self.executor_threads)

kube_client = Any()

@default("kube_client")
def _get_kube_client(self):
kubernetes.config.load_incluster_config()
return kubernetes.client.CoreV1Api()

async def get_image_manifest(self, image, tag):
image = image.split("/", 1)[1]
await asyncio.wrap_future(
self.executor.submit(self._pre_get_image_manifest, image, tag)
)
return await super().get_image_manifest(image, tag)

def _pre_get_image_manifest(self, image, tag):
self._create_repository(image, tag)
self._refresh_password()

def _create_repository(self, image, tag):
try:
self.ecr_client.create_repository(repositoryName=image)
self.log.info(f"ECR repo {image} created")
except self.ecr_client.exceptions.RepositoryAlreadyExistsException:
self.log.info(f"ECR repo {image} already exists")

# An IAM principal is used to generate an auth token that is valid for 12 hours
# ref: https://docs.aws.amazon.com/AmazonECR/latest/userguide/Registries.html
# TODO: cache auth if not expired - authorizationData[i]["expiresAt"]
def _refresh_password(self):
auths = self.ecr_client.get_authorization_token()["authorizationData"]
auth = next(x for x in auths if x["proxyEndpoint"] == self.url)
self._patch_docker_config_secret(auth)
self.password = (
base64.b64decode(auth["authorizationToken"]).decode("utf-8").split(":")[1]
)

def _patch_docker_config_secret(self, auth):
"""Patch push_secret. Necessary because AWS rotates auth tokens.
ref: https://docs.aws.amazon.com/AmazonECR/latest/userguide/Registries.html"""
secret_data = {"auths": {self.url: {"auth": auth["authorizationToken"]}}}
secret_data = base64.b64encode(json.dumps(secret_data).encode("utf8")).decode(
"utf8"
)
with open("/var/run/secrets/kubernetes.io/serviceaccount/namespace") as f:
namespace = f.read()
self.kube_client.patch_namespaced_secret(
self.parent.push_secret, namespace, {"data": {"config.json": secret_data}}
)


class FakeRegistry(DockerRegistry):
"""
Fake registry that contains no images
Expand Down
50 changes: 50 additions & 0 deletions doc/zero-to-binderhub/setup-binderhub.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,22 @@ where:
* `<harbor-username>` is the Harbor username
* `<harbor-password>` is the Harbor password

If you are using Amazon Elastic Container Registry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Update `secret.yaml` to include the following::

registry:
url: https://<ACCOUNT_NUMBER>.dkr.ecr.<REGION>.amazonaws.com

where:

* ``<ACCOUNT_NUMBER>`` is the identifier of your AWS account
* ``<REGION>`` is the AWS region of the ECR registry, e.g. ``us-east-1``

As ECR uses AWS IAM for authorization, specifying ``username`` and ``password``
is not necessary.

Create ``config.yaml``
----------------------

Expand Down Expand Up @@ -205,6 +221,40 @@ As an example, the config should look like the following::
token_url: https://abcde.gra7.container-registry.ovh.net/service/token?service=harbor-registry


If you are using Amazon Elastic Container Registry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you want your BinderHub to push and pull images from an Amazon Elastic
Container Registry (ECR), then your `config.yaml` file will look as follows::

rbac:
patchSecrets: true
config:
BinderHub:
use_registry: true
registry_class: binderhub.registry.AWSElasticContainerRegistry
image_prefix: "<ACCOUNT_NUMBER>.dkr.ecr.<REGION>.amazonaws.com/<prefix>-"
AWSElasticContainerRegistry:
aws_region: <REGION>

where:

* ``<ACCOUNT_NUMBER>`` is the identifier of your AWS account
* ``<REGION>`` is the AWS region of the ECR registry, e.g. ``us-east-1``.
* ``<prefix>`` can be any string, and will be prepended to image names. We
recommend something descriptive such as ``binder-dev-`` or ``binder-prod-``
(ending with a `-` is useful).

If you opted to use an IAM User with programmatic access instead of assuming
the role in the previous step you will additionally need to add the following
to your `config.yaml`::

extraEnv:
- name: AWS_ACCESS_KEY_ID
value: "xxx"
- name: AWS_SECRET_ACCESS_KEY
value: "yyy"

If you are using a custom registry
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
93 changes: 93 additions & 0 deletions doc/zero-to-binderhub/setup-registry.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,99 @@ To use the OVH Container Registry, log in to the `OVH Control Panel <https://www

For more information about these steps, check out the `OVH Documentation <https://docs.ovh.com/gb/en/private-registry/creating-a-private-registry>`_

.. _use-ecr:

Set up Amazon Elastic Container Registry
----------------------------------------

To use Amazon Elastic Container Registry (ECR), you'll need to use AWS IAM to
authorize the machine or pod running BinderHub so it can push images. There
are a number of options on how to do this with IAM and Kubernetes, but we
will highlight two: define and assign an IAM role, or assume an IAM user with programmatic access.

For the former, start by creating an IAM policy that grants access to create repositories and
read/write images from them. You can create policies using the AWS console, CLI
or API as detailed in the documentation `Creating IAM policies <https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html>`_.
An example IAM permissions policy is provided below. For more information and examples see `Identity and Access Management for Amazon Elastic Container Registry <https://docs.aws.amazon.com/AmazonECR/latest/userguide/security-iam.html>`_.

.. code-block:: json

{
"Statement": [
{
"Action": [
"ecr:ListImages"
],
"Effect": "Allow",
"Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_NUMBER>:<prefix>-*",
"Sid": "ListImagesInRepository"
},
{
"Action": [
"ecr:GetAuthorizationToken"
],
"Effect": "Allow",
"Resource": "*",
"Sid": "GetAuthorizationToken"
},
{
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:GetRepositoryPolicy",
"ecr:DescribeRepositories",
"ecr:ListImages",
"ecr:DescribeImages",
"ecr:BatchGetImage",
"ecr:InitiateLayerUpload",
"ecr:UploadLayerPart",
"ecr:CompleteLayerUpload",
"ecr:PutImage"
],
"Effect": "Allow",
"Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_NUMBER>:<prefix>-*",
"Sid": "ManageRepositoryContents"
},
{
"Action": [
"ecr:CreateRepository"
],
"Effect": "Allow",
"Resource": "arn:aws:ecr:<REGION>:<ACCOUNT_NUMBER>:<prefix>-*",
"Sid": "CreateRepository"
}
],
"Version": "2012-10-17"
}

If you used AWS services like EC2 or EKS to set up your Kubernetes cluster you
can add this policy to the IAM Role assumed by the nodes of the cluster, e.g.
``nodes.<somename>.k8s.local`` if you followed `Zero to JupyterHub with Kubernetes <https://zero-to-jupyterhub.readthedocs.io/>`_
and used kops. One way to do this from the AWS Console is to navigate to the IAM service and click "Roles" in the side bar,
then find and select the IAM Role assumed by the nodes of your cluster, click "Permissions" and then "Attach policies" to attach
the IAM Policy we just created. The IAM permissions policy will need to accompanied with an IAM
trust policy to allow it to be assumed. A suitable trust policy may already be defined for your node's IAM Role,
you can view and edit the trust policy by clicking the "Trust relationships" tab is next to the "Permissions" tab in the AWS console.
An example trust policy for EC2 is provided below. For more information see `Granting a User Permissions to Pass a Role to an AWS Service <https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_passrole.html>`_.
This is the recommended method if your Kubernetes cluster is provisioned on AWS.

.. code-block:: json

{
"Version": "2012-10-17",
"Statement": {
"Sid": "TrustPolicyStatementThatAllowsEC2ServiceToAssumeTheAttachedRole",
"Effect": "Allow",
"Principal": { "Service": "ec2.amazonaws.com" },
"Action": "sts:AssumeRole"
}
}

Alternatively to the above steps, you can create an IAM user with programmatic access (see
`Creating an IAM User in Your AWS Account <https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html>`_)
and specify the ``AWS_ACCESS_KEY_ID`` and ``AWS_SECRET_ACCESS_KEY`` environment
variables in the following step.

Next step
---------

Expand Down
7 changes: 7 additions & 0 deletions helm-chart/binderhub/schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,13 @@ properties:
description: |
Decides if RBAC resources are to be created and referenced by the the
Helm chart's workloads.
patchSecrets:
type: boolean
description: |
Allows get and patch Secrets if set to true.
This should only be needed if using the AWSElasticContainerRegistry
registry class, because AWS rotates auth tokens. See [the
documentation](https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html)

nodeSelector: &nodeSelector-spec
type: object
Expand Down
5 changes: 5 additions & 0 deletions helm-chart/binderhub/templates/rbac.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,11 @@ rules:
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get"]
{{- if .Values.rbac.patchSecrets -}}
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "patch"]
{{- end }}
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
boto3
# About pycurl:
# - pycurl is an optional dependency which improves performance
# - pycurl requires both `curl-config` and `gcc` to be available when installing
Expand Down