Submariner Route Agent fails to populate custom PBR tables created by AWS CNI, causing intermittent cross-cluster connectivity loss

**What happened**:

I am experiencing intermittent, one-way cross-cluster connectivity failures in EKS multi-cluster environment. Connection to specific pods fails (times out), while other pods on the same node remain reachable.

The root cause is a **Policy-Based Routing (PBR)** integration failure between the **Submariner Route Agent** and the **AWS CNI**.

1.  The AWS CNI creates dedicated PBR rules and custom routing tables (e.g., `table 2`) for pods assigned IPs from **secondary ENIs** (e.g., `ens6`).
2.  The **Submariner Route Agent** correctly populates the `main` routing table with the necessary `vx-submariner` routes for remote clusters.
3.  However, the Route Agent **fails to detect or populate** these custom, CNI-created tables (e.g., `table 2`).
4.  As a result, any egress traffic (including replies to `ping` or `traceroute`) from a pod forced to use this custom table is **not routed** to the `vx-submariner` tunnel. Instead, it follows the table's default route (the VPC gateway via `ens6`), causing the packet to be blackholed.

Recreating the failing pod temporarily resolves the issue because the PBR rule is torn down, and the new pod instance (by chance) often uses the `main` table.

**What you expected to happen**:

The Submariner Route Agent should detect all active routing tables used by routable pods, including custom PBR tables dynamically created by the AWS CNI.

It should ensure that all necessary `vx-submariner` routes (for remote cluster/service CIDRs) are replicated to **all** relevant tables (i.e., `table main` *and* any custom tables like `table 2`) to guarantee consistent cross-cluster egress routing for all pods, regardless of which ENI or routing table they use.

**How to reproduce it (as minimally and precisely as possible)**:

1.  Deploy Submariner in an EKS cluster using AWS CNI in its default PBR-enabled mode.
2.  Create enough pods on a single node to force the AWS CNI to allocate IPs from a **secondary ENI** (e.g., `ens6`).
3.  Identify a pod that has been assigned an IP on this secondary ENI.
4.  Confirm this by logging into the node and finding a specific PBR rule for the pod's IP (e.g., `ip rule show | grep <pod_ip>`).
5.  Verify that the custom table (e.g., `ip route show table 2`) is **missing** the `vx-submariner` routes, while `ip route show table main` **contains** them.
6.  Attempt to `ping` or `traceroute` this specific pod from a remote (Submariner-connected) cluster.
7.  Observe the `traceroute` failing (timing out) after reaching the destination cluster's gateway.

**Anything else we need to know?**:

**Environment**:
- Subctl version: 0.20
- Cloud provider or hardware configuration: AWS EKS 1.32


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Submariner Route Agent fails to populate custom PBR tables created by AWS CNI, causing intermittent cross-cluster connectivity loss #3697

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Submariner Route Agent fails to populate custom PBR tables created by AWS CNI, causing intermittent cross-cluster connectivity loss #3697

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions