Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support assumeRoleWithWebIdentity for Route53 issuer #6878

Conversation

pwhitehead-splunk
Copy link
Contributor

Pull Request Motivation

Update the Route53 provider to support fetching credentials using AssumeRoleWithWebIdentity.

Kind

/kind feature

Release Note

Update the Route53 provider to support fetching credentials using AssumeRoleWithWebIdentity

@jetstack-bot jetstack-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. labels Mar 29, 2024
@jetstack-bot
Copy link
Contributor

Hi @pwhitehead-splunk. Thanks for your PR.

I'm waiting for a cert-manager member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jetstack-bot jetstack-bot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/acme Indicates a PR directly modifies the ACME Issuer code area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code labels Mar 29, 2024
@jetstack-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign munnerz for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jetstack-bot jetstack-bot added area/api Indicates a PR directly modifies the 'pkg/apis' directory area/deploy Indicates a PR modifies deployment configuration labels Mar 29, 2024
@erikgb
Copy link
Contributor

erikgb commented Apr 24, 2024

/ok-to-test

@cert-manager-prow cert-manager-prow bot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 24, 2024
@ThatsMrTalbot
Copy link
Contributor

ThatsMrTalbot commented Apr 24, 2024

Looking at this implementation, it relies on a token mounted within the cert-manager Pod, meaning cert-manager needs a token with broad permissions mounted.

Would it be better to implement it in a similar way to Vault? Allowing for the token to either be loaded from a secret or to be issued from a service account.

Loading from a secret covers the use case of a non IRSA token, issuing from a service account covers the IRSA case.

Could look like something like this in the issuer config:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    ...
    solvers:
      - selector:
          dnsZones:
          - "example.com"
        dns01:
          route53:
            auth:
              kubernetes:
                serviceAccountRef:
                  name: my-irsa-service-account
                  audiences: ["sts.amazonaws.com"] # This should be the default value, as thats what IRSA tokens use

or

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    ...
    solvers:
      - selector:
          dnsZones:
          - "example.com"
        dns01:
          route53:
            auth:
              kubernetes:
                secretRef:
                  name: my-irsa-secret
                  key: token

@pwhitehead-splunk
Copy link
Contributor Author

@ThatsMrTalbot Thank you for the feedback! The implementation was written with the expectation that the end user would use the token from the projected volume that's provided by IRSA or Workload Identity. The motivation for this PR was to issue certs in AKS when using Route53 solvers without requiring static credentials. I'm not familiar with GCP but I'd expect there's a similar implementation.

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    ...
    solvers:
    - dns01:
        route53:
          region: us-west-2
          role: arn:aws:iam::YYYYYYYYYYYY:role/dns-manager # must trust the OIDC provider, should verify aud and sub
          webIdentityToken: /var/run/secrets/azure/tokens/azure-identity-token # provided by Workload Identity

Assuming the IAM Identity Provider and IAM role's trust is set up properly then the token's use is in line with its expected use case that IRSA/Workload Identity solves; we're exchanging a token for temporary credentials.

The secret use case you mentioned is out of scope for what I hope to achieve, but if I follow your question properly, you're suggesting that we issue a new token with a configurable audience when reconciling a cert? What would the payload look like? IAM trust policies typically have a conditional to trust the subject, as an end user I'd expect that to match the namespace and service account used in the Cert Manager deployment. Issuing a token with a custom audience feels like a reimplementation of IRSA/Workload Identity. That being said, if that's the path you prefer then I can make changes.

@ThatsMrTalbot
Copy link
Contributor

Hiya @pwhitehead-splunk

Solutions like IRSA simply implement a mutating webhook to project a Kubernetes service account token with a different audience inside your pods. In the case of IRSA the audience is set to sts.amazonaws.com (see https://github.com/aws/amazon-eks-pod-identity-webhook).

This implementation is true of both IRSA in AWS (https://github.com/aws/amazon-eks-pod-identity-webhook/blob/master/pkg/handler/handler.go#L332-L347) and Azure Workload Identity in AKS (https://github.com/Azure/azure-workload-identity/blob/9b60b5240fcbee6c2e4660b435a20dc2683ef945/pkg/webhook/webhook.go#L419-L438).

What I was proposing is that instead of using the projected token in the cert-manager pod, we can issue a token ourselves using the TokenRequest API, this is something we already do in other parts the cert-manager codebase -

createTokenFn := func(ns string) internalvault.CreateToken { return v.kclient.CoreV1().ServiceAccounts(ns).CreateToken }

The TokenRequest API is the same API that is used by Kubelet for the projected service account tokens, which in turn are used by IRSA and Azure Workload Identity. So by requesting a token directly its achieving the same goal, without the need for cert-manager service account itself to have IAM permissions. The user can just reference a service account that has the needed IAM permissions and cert-manager can request a token itself. The token itself would look the same as one provided by IRSA or Azure Workload Identity.

So my suggestion is basically instead of doing:

  • User pre-configures cert-manager with IRSA or Azure Workload Identity
  • User specifies in the Issuer config the path that IRSA/Identity token lives at
  • When cert-manager wants to talk to AWS it loads the token from disk, which we then use with the AssumeRoleWithWebIdentity call.

We do:

  • User specifies in the Issuer config that a ServiceAccount should be used when talking to AWS
  • When cert-manager wants to talk to AWS we use the token request API to issue a service account token, which we then use with the AssumeRoleWithWebIdentity call.

This has the added benefit of allowing different issuers to reference different service accounts with different IAM permissions in AWS.

@pwhitehead-splunk
Copy link
Contributor Author

That makes sense, thanks for the clarification! I'll update the PR.

@cert-manager-prow cert-manager-prow bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 30, 2024
@pwhitehead-splunk pwhitehead-splunk force-pushed the support-assume-role-with-web-identity branch from 76f5b53 to d4df2a3 Compare May 1, 2024 20:33
@cert-manager-prow cert-manager-prow bot added dco-signoff: no Indicates that at least one commit in this pull request is missing the DCO sign-off message. and removed dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels May 1, 2024
@pwhitehead-splunk pwhitehead-splunk changed the title support assumeRoleWithWebIdentity for Route53 issuer WIP: support assumeRoleWithWebIdentity for Route53 issuer May 1, 2024
@cert-manager-prow cert-manager-prow bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 1, 2024
@pwhitehead-splunk pwhitehead-splunk force-pushed the support-assume-role-with-web-identity branch from d4df2a3 to 0a42b47 Compare May 3, 2024 19:52
@cert-manager-prow cert-manager-prow bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. dco-signoff: no Indicates that at least one commit in this pull request is missing the DCO sign-off message. and removed dco-signoff: no Indicates that at least one commit in this pull request is missing the DCO sign-off message. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. labels May 3, 2024
@pwhitehead-splunk pwhitehead-splunk force-pushed the support-assume-role-with-web-identity branch from 47cc21f to ab6d57f Compare May 7, 2024 16:35
@cert-manager-prow cert-manager-prow bot added dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed dco-signoff: no Indicates that at least one commit in this pull request is missing the DCO sign-off message. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 7, 2024
@pwhitehead-splunk pwhitehead-splunk force-pushed the support-assume-role-with-web-identity branch from ab6d57f to 35571e0 Compare May 7, 2024 17:55
@cert-manager-prow cert-manager-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 7, 2024
Signed-off-by: Paul Whitehead <[email protected]>
@pwhitehead-splunk pwhitehead-splunk changed the title WIP: support assumeRoleWithWebIdentity for Route53 issuer support assumeRoleWithWebIdentity for Route53 issuer May 7, 2024
@cert-manager-prow cert-manager-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 7, 2024
@pwhitehead-splunk
Copy link
Contributor Author

@ThatsMrTalbot let me know what you think about the changes

@ThatsMrTalbot
Copy link
Contributor

ThatsMrTalbot commented May 9, 2024

@ThatsMrTalbot let me know what you think about the changes

I really like this implementation, I think this will be a powerful secretless solution for AWS auth. I've had a quick skim and left a comment. Will give it a proper look tomorrow.

Thanks for implementing this, its really cool 😄

Signed-off-by: Paul Whitehead <[email protected]>
@ThatsMrTalbot
Copy link
Contributor

/test pull-cert-manager-master-make-test

Signed-off-by: Paul Whitehead <[email protected]>
@ThatsMrTalbot
Copy link
Contributor

/test pull-cert-manager-master-make-test

@ThatsMrTalbot
Copy link
Contributor

Thanks very much for this!
/lgtm
/approve

@cert-manager-prow cert-manager-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2024
@cert-manager-prow
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ThatsMrTalbot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 14, 2024
@cert-manager-prow cert-manager-prow bot merged commit cd2d71f into cert-manager:master May 14, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code area/acme Indicates a PR directly modifies the ACME Issuer code area/api Indicates a PR directly modifies the 'pkg/apis' directory area/deploy Indicates a PR modifies deployment configuration dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. ok-to-test release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants