Skip to content

Bug Report: AWS Security Group Leakage when manage-backend-security-group-rules is false #4652

@sss-ng

Description

@sss-ng

Bug Description
The AWS Load Balancer Controller incorrectly triggers ec2:AuthorizeSecurityGroupIngress on Node/Pod security groups even when the Ingress annotation alb.ingress.kubernetes.io/manage-backend-security-group-rules is set to "false".

I believe this occurs because the TargetGroupBinding (TGB) reconciler unconditionally invokes the NetworkingManager's reconciliation logic, regardless of whether a networking specification is present in the TGB spec. When the annotation is set to "false", the generated TGB has a nil Spec.Networking, but the controller proceeds to track it in its internal security group reconciliation lifecycle. In environments with strict IAM Permissions Boundaries (denying ec2:AuthorizeSecurityGroupIngress), this leads to recurring 403 UnauthorizedOperation errors and prevents successful reconciliation.

Steps to Reproduce

  1. Deploy an Ingress with the following annotations:
       alb.ingress.kubernetes.io/scheme: internet-facing
       alb.ingress.kubernetes.io/target-type: ip
       alb.ingress.kubernetes.io/security-groups: sg-xxxxxx # Custom SG
       alb.ingress.kubernetes.io/manage-backend-security-group-rules: "false"
  1. Ensure the controller's IAM role has a Permissions Boundary that denies ec2:AuthorizeSecurityGroupIngress (or simply lack the permission).
  2. Observe the controller logs.
  • Manifests applied while reproducing the issue:
   apiVersion: networking.k8s.io/v1
   kind: Ingress
   metadata:
     name: repro-ingress
     annotations:
       alb.ingress.kubernetes.io/scheme: internet-facing
       alb.ingress.kubernetes.io/target-type: ip
       alb.ingress.kubernetes.io/security-groups: sg-xxxxxx
       alb.ingress.kubernetes.io/manage-backend-security-group-rules: "false"
   spec:
     ingressClassName: alb
     rules:
     - http:
         paths:
         - path: /
           pathType: Prefix
           backend:
             service:
               name: repro-svc
               port:
                 number: 80
  • Controller logs/error messages while reproducing the issue:
{"level": "error",
"ts": "2026-03-27T14:56:07Z",
"msg": "Requesting network requeue due to error from ReconcileForPodEndpoints",
"tgb": {"name":"k8s-REDACTED", "namespace": "REDACTED"},
"error": "operation error EC2: AuthorizeSecurityGroupIngress, https response error StatusCode: 403, RequestID: ..."

Expected Behavior
When manage-backend-security-group-rules is set to "false", the controller should skip all security group mutations for the associated TargetGroupBinding resources. Specifically, it should not call into the NetworkingManager for resources that have opted out.

Actual Behavior
The controller attempts to reconcile security group rules for the TGB despite the opt-out.

  • The bug causes reconciliation failures and 403 errors in logs, though target group registration might still succeed if permissions for that are present.
  • This happens always when the annotation is set to "false" but the global flag --enable-backend-security-group is true (default).

Regression
Was the functionality working correctly in a previous version ? [No / Unknown]
Verified in v2.17.1.

Current Workarounds
Setting the global flag --enable-backend-security-group=false resolves the issue by disabling backend SG management cluster-wide. However, this is not a viable solution if other Ingresses in the cluster do require managed security groups.

Environment

  • AWS Load Balancer controller version: v2.17.1
  • Kubernetes version: v1.30+
  • Using EKS (yes/no), if so version?: Yes, 1.33
  • Using Service or Ingress: Both
  • AWS region: us-east-1
  • How was the aws-load-balancer-controller installed: Helm

Possible Solution (Optional)
Add a nil guard in pkg/targetgroupbinding/resource_manager.go to check if tgb.Spec.Networking is non-nil before invoking the NetworkingManager.

In reconcileWithIPTargetType around line 209:

	if tgb.Spec.Networking != nil {
		if err := m.networkingManager.ReconcileForPodEndpoints(ctx, tgb, endpoints); err != nil {
			// ...
		}
	}

And similarly in reconcileWithInstanceTargetType.

Contribution Intention (Optional)

  • Yes, I'm willing to submit a PR to fix this issue

Additional Context
Testing with a reproduction unit test confirmed that ReconcileForPodEndpoints is called even when Spec.Networking is nil. Implementing the guard locally fixed the issue and resolved the unauthorized API calls.

Metadata

Metadata

Assignees

Labels

triage/needs-informationIndicates an issue needs more information in order to work on it.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions