-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix: Race condition in webhook certificate renewal with cert-manager self-signed issuer without a dedicated CA certificate #4359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Race condition in webhook certificate renewal with cert-manager self-signed issuer without a dedicated CA certificate #4359
Conversation
…self-signed issuer without a dedicated CA certificate kubernetes-sigs#4019 Signed-off-by: Pradeep Lakshmi Narasimha <[email protected]>
Hi @praddy26. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
@praddy26 thanks for the contribution! just to double check, this change only affect those with |
@oliviassss
Users with PLMK if you need any other info or changes. |
/lgtm cc: @zac-nixon @shraddhabang for awareness. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: oliviassss, pniebylski-zilch, praddy26 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fix cert-manager webhook certificate renewal race condition
Issue
Fixes #4019 - Race condition during cert-manager webhook certificate renewal causing webhook failures
Description
This PR addresses a critical race condition that occurs when cert-manager renews webhook certificates for the AWS Load Balancer Controller. The issue manifests as webhook validation/mutation failures during certificate renewal periods, causing intermittent service disruptions.
Root Cause Analysis
The original cert-manager integration used a single-tier certificate approach where:
Solution Architecture
Implemented a 3-tier certificate hierarchy that eliminates the race condition:
Key Benefits:
Implementation Details
New Resources Created:
templates/cert-manager.yaml
- New 3-tier CA certificate hierarchytemplates/webhook.yaml
- Updated to use new CA issuervalues.yaml
- Added cert-manager configuration optionsdocs/deploy/cert-manager.md
- Comprehensive documentationConfiguration Options:
Backward Compatibility
100% backward compatible with existing deployments:
enableCertManager: true
when readycertManager.issuerRef
Testing Scenarios Validated
✅ Core Functionality:
✅ Upgrade Scenarios:
✅ Template Quality:
clientConfig
)✅ Production Validation:
✅ Edge Cases:
Before/After Comparison
Before (Race Condition Present):
After (Race Condition Eliminated):
Checklist
README.md
, or thedocs
directory) - Addeddocs/deploy/cert-manager.md
BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯
Impact: This change eliminates a critical production issue affecting webhook reliability during certificate renewals while maintaining 100% backward compatibility. The new architecture provides a stable, enterprise-ready certificate management solution that scales with organizational needs.