Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

renewing of SRE SSL cert #2361

Open
2 of 5 tasks
mattwestby opened this issue Jan 14, 2025 · 7 comments
Open
2 of 5 tasks

renewing of SRE SSL cert #2361

mattwestby opened this issue Jan 14, 2025 · 7 comments
Assignees
Labels
bug Problem when deploying a Data Safe Haven.

Comments

@mattwestby
Copy link

✅ Checklist

  • I have searched open and closed issues for duplicates.
  • This is a problem observed when deploying a Data Safe Haven.
  • I can reproduce this with the latest version.
  • I have read through the documentation.
  • This isn't an open-ended question (open a discussion if it is).

💻 System information

  • Operating System:
  • Data Safe Haven version:

📦 Packages

List of packages
Paste list of packages here

🚫 Describe the problem

When the SRE SSL cert is near for renewal re-running the SRE deployment doesnt detect this so it doesnt create a new SRE SSL cert.

🌳 Log messages

Relevant log messages
Your log details here

♻️ To reproduce

@mattwestby mattwestby added the bug Problem when deploying a Data Safe Haven. label Jan 14, 2025
@github-project-automation github-project-automation bot moved this to To Be Refined in Data Safe Haven Jan 16, 2025
@JimMadge JimMadge moved this from To Be Refined to Ready to Work in Data Safe Haven Jan 16, 2025
@JimMadge JimMadge moved this from Ready to Work to In progress in Data Safe Haven Jan 23, 2025
@JimMadge JimMadge self-assigned this Jan 23, 2025
@JimMadge
Copy link
Member

I tried just deleting the "SSLCertificate" resource from the stack. That led to a Snapshot Integrity Error in the Pulumi CLI. Possibly because other resources in the stack make reference to the cert.

@mattwestby
Copy link
Author

mattwestby commented Jan 24, 2025 via email

@JimMadge
Copy link
Member

JimMadge commented Jan 24, 2025

The restrictions are,

  • We filter IP lists so HTTP(file) challenges are not possible.
  • DNS challenges are possible but require permissions (managed identity).

Options for a permanent solution,

  • Modern reverse proxies like Traefik and Caddy can automate SSL certs with Let's Encrypt using DNS challenges. Could replace the application gateway with one of these reverse proxies.
  • It looks like these build upon Lego to do this, we could run Lego ourselves.
  • (Less clean) modify the SSLCertificateProvider so that the resource will be recreated when the cert is ~60 days old. This would still require manual intervention, running Pulumi Update.

@jemrobinson
Copy link
Member

jemrobinson commented Jan 24, 2025

We prefer to use the Application Gateway as:

  • it's another resource that we can rely on Azure to manage (rather than configuring our own SSL termination)
  • it's integrated with Azure Keyvault for certificate management
  • it's easy to help users debug if something goes wrong
  • it's cheaper than running a container (~£7 per month rather than £20-£50)

Using Lego (or similar) to automatically update the certificate is still a good idea though. This could run as a container instance or (preferably) an Azure Function. Note that you'd also need a managed identity with appropriate privileges to change DNS records in order to meet the DNS-01 challenge requirements.

@craddm craddm moved this from In progress to In review in Data Safe Haven Jan 28, 2025
@JimMadge
Copy link
Member

Lego looks like a good route.

On an existing SRE I can use az CLI credentials and the DNS challenge to create/renew certs. To automate this without the CLI we can create and use a managed identity.

Would be possible to put this process in a container, or maybe serverless compute.

@jemrobinson
Copy link
Member

jemrobinson commented Jan 28, 2025

+1 for serverless (e.g. function app) as this should be cheaper for something that needs to run rarely and for a short time on each invocation.

@JimMadge
Copy link
Member

Agreed. Should only need to run once a day or week and each run will only be a few minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Problem when deploying a Data Safe Haven.
Projects
Status: In review
Development

No branches or pull requests

3 participants