Skip to content

Nil pointer panic in service reconciler when reconciling ClusterIP service created by Istio Gateway API (networking.istio.io/service-type: ClusterIP) #4724

@joey-jonko-paypal

Description

@joey-jonko-paypal

Bug Report

Description

When using the Istio Gateway API integration with networking.istio.io/service-type: ClusterIP on a Gateway resource, the aws-load-balancer-controller's Service reconciler panics with a nil pointer dereference. The panic is recovered, so it is log noise only (ALBs and NLBs are unaffected), but it fires on every reconcile loop.

What happened

Istiod creates a ClusterIP-typed Service (e.g. dev-gateway-external-istio) for a Gateway annotated with networking.istio.io/service-type: ClusterIP. This suppresses NLB provisioning, with the intent that an external ALB is provisioned separately via a Kubernetes Ingress resource targeting the Gateway pods directly (target-type: ip).

The Service reconciler picks up this ClusterIP service and calls buildModel, which correctly returns lb == nil. The if lb == nil cleanup branch runs and completes successfully (no-op since there is no finalizer). However, the reconciler then falls through to reconcileLoadBalancerResources(ctx, svc, stack, lb, backendSGRequired) with lb still nil. Inside that function, lb.DNSName().Resolve(ctx) dereferences the nil pointer and panics.

Observed log

{
  "level": "error",
  "controller": "service",
  "name": "dev-gateway-external-istio",
  "namespace": "istio-ingress",
  "error": "panic: runtime error: invalid memory address or nil pointer dereference [recovered]"
}

Root cause

In controllers/service/service_controller.go, the reconcile() function is missing a return nil after the successful cleanup path when lb == nil:

// controllers/service/service_controller.go
func (r *serviceReconciler) reconcile(ctx context.Context, req reconcile.Request) error {
    ...
    if lb == nil {
        cleanupLoadBalancerFn := func() {
            err = r.cleanupLoadBalancerResources(ctx, svc, stack)
        }
        r.metricsCollector.ObserveControllerReconcileLatency(controllerName, "cleanup_load_balancer", cleanupLoadBalancerFn)
        if err != nil {
            return ctrlerrors.NewErrorWithMetrics(controllerName, "cleanup_load_balancer_error", err, r.metricsCollector)
        }
        // BUG: missing `return nil` here — falls through to reconcileLoadBalancerResources with lb == nil
    }
    return r.reconcileLoadBalancerResources(ctx, svc, stack, lb, backendSGRequired)
}

The nil dereference occurs at this line inside reconcileLoadBalancerResources:

lbDNS, err = lb.DNSName().Resolve(ctx) // panic: lb is nil

Proposed fix

Add a return nil after the successful cleanup when lb == nil:

if lb == nil {
    cleanupLoadBalancerFn := func() {
        err = r.cleanupLoadBalancerResources(ctx, svc, stack)
    }
    r.metricsCollector.ObserveControllerReconcileLatency(controllerName, "cleanup_load_balancer", cleanupLoadBalancerFn)
    if err != nil {
        return ctrlerrors.NewErrorWithMetrics(controllerName, "cleanup_load_balancer_error", err, r.metricsCollector)
    }
    return nil // <-- fix: do not fall through to reconcileLoadBalancerResources when there is no LB
}
return r.reconcileLoadBalancerResources(ctx, svc, stack, lb, backendSGRequired)

Steps to reproduce

  1. Install the aws-load-balancer-controller (tested on v2.17.1; bug present on current main).
  2. Install Istio with Gateway API support.
  3. Create a Gateway resource with networking.istio.io/service-type: ClusterIP in the annotations:
    apiVersion: gateway.networking.k8s.io/v1
    kind: Gateway
    metadata:
      name: my-external-gateway
      namespace: istio-ingress
      annotations:
        networking.istio.io/service-type: ClusterIP
    spec:
      gatewayClassName: istio
      listeners:
        - name: https
          port: 443
          protocol: HTTPS
          ...
  4. Observe repeated panic logs from the Service reconciler for the <gateway-name>-istio ClusterIP service.

Environment

  • aws-load-balancer-controller version: v2.17.1 (also confirmed unfixed on current main / v3.2.2)
  • Kubernetes version: 1.32
  • Istio version: 1.24.x
  • Gateway API version: v1.2.x

Impact

The panic is recovered by the runtime.HandleReconcileError wrapper, so actual load balancer provisioning is unaffected. ALBs (provisioned via Ingress) and NLBs (provisioned via LoadBalancer-typed Services) continue to function correctly. The issue produces repeated error log noise on every reconcile interval for the affected service.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions