-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Bug Report: AWS Security Group Leakage when manage-backend-security-group-rules is false #4652
Description
Bug Description
The AWS Load Balancer Controller incorrectly triggers ec2:AuthorizeSecurityGroupIngress on Node/Pod security groups even when the Ingress annotation alb.ingress.kubernetes.io/manage-backend-security-group-rules is set to "false".
I believe this occurs because the TargetGroupBinding (TGB) reconciler unconditionally invokes the NetworkingManager's reconciliation logic, regardless of whether a networking specification is present in the TGB spec. When the annotation is set to "false", the generated TGB has a nil Spec.Networking, but the controller proceeds to track it in its internal security group reconciliation lifecycle. In environments with strict IAM Permissions Boundaries (denying ec2:AuthorizeSecurityGroupIngress), this leads to recurring 403 UnauthorizedOperation errors and prevents successful reconciliation.
Steps to Reproduce
- Deploy an Ingress with the following annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/security-groups: sg-xxxxxx # Custom SG
alb.ingress.kubernetes.io/manage-backend-security-group-rules: "false"- Ensure the controller's IAM role has a Permissions Boundary that denies
ec2:AuthorizeSecurityGroupIngress(or simply lack the permission). - Observe the controller logs.
- Manifests applied while reproducing the issue:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: repro-ingress
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/security-groups: sg-xxxxxx
alb.ingress.kubernetes.io/manage-backend-security-group-rules: "false"
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: repro-svc
port:
number: 80- Controller logs/error messages while reproducing the issue:
{"level": "error",
"ts": "2026-03-27T14:56:07Z",
"msg": "Requesting network requeue due to error from ReconcileForPodEndpoints",
"tgb": {"name":"k8s-REDACTED", "namespace": "REDACTED"},
"error": "operation error EC2: AuthorizeSecurityGroupIngress, https response error StatusCode: 403, RequestID: ..."Expected Behavior
When manage-backend-security-group-rules is set to "false", the controller should skip all security group mutations for the associated TargetGroupBinding resources. Specifically, it should not call into the NetworkingManager for resources that have opted out.
Actual Behavior
The controller attempts to reconcile security group rules for the TGB despite the opt-out.
- The bug causes reconciliation failures and
403errors in logs, though target group registration might still succeed if permissions for that are present. - This happens always when the annotation is set to
"false"but the global flag--enable-backend-security-groupistrue(default).
Regression
Was the functionality working correctly in a previous version ? [No / Unknown]
Verified in v2.17.1.
Current Workarounds
Setting the global flag --enable-backend-security-group=false resolves the issue by disabling backend SG management cluster-wide. However, this is not a viable solution if other Ingresses in the cluster do require managed security groups.
Environment
- AWS Load Balancer controller version: v2.17.1
- Kubernetes version: v1.30+
- Using EKS (yes/no), if so version?: Yes, 1.33
- Using Service or Ingress: Both
- AWS region: us-east-1
- How was the aws-load-balancer-controller installed: Helm
Possible Solution (Optional)
Add a nil guard in pkg/targetgroupbinding/resource_manager.go to check if tgb.Spec.Networking is non-nil before invoking the NetworkingManager.
In reconcileWithIPTargetType around line 209:
if tgb.Spec.Networking != nil {
if err := m.networkingManager.ReconcileForPodEndpoints(ctx, tgb, endpoints); err != nil {
// ...
}
}And similarly in reconcileWithInstanceTargetType.
Contribution Intention (Optional)
- Yes, I'm willing to submit a PR to fix this issue
Additional Context
Testing with a reproduction unit test confirmed that ReconcileForPodEndpoints is called even when Spec.Networking is nil. Implementing the guard locally fixed the issue and resolved the unauthorized API calls.