Skip to content

Commit 4fdca65

Browse files
committed
Add ECMP symmetric replies.
When traffic arrives over an ECMP route, there is no guarantee that the reply traffic will egress over the same route. Sometimes, the nature of the traffic (or the intervening equipment) means that it is important for reply traffic to go out the same route it came in. This commit introduces optional ECMP symmetric reply behavior. If configured, then traffic to or from the ECMP route will be sent to conntrack. New incoming traffic over the route will have the source MAC address and incoming port saved in the ct_label. Reply traffic then uses this saved information to send the packet back out the same way it came in. To facilitate this, a new table was added to the ingress logical router pipeline. The ECMP_STATEFUL table is responsible for committing to conntrack and setting the ct_label when it detects new incoming traffic from the route. Since ingress pipeline logic on the logical router depends on ct state of a particular hypervisor, this feature is only usable on gateway routers. Signed-off-by: Mark Michelson <[email protected]> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1849683 Acked-by: Numan Siddique <[email protected]>
1 parent 6cfb44a commit 4fdca65

File tree

10 files changed

+496
-54
lines changed

10 files changed

+496
-54
lines changed

lib/logical-fields.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,10 @@ ovn_init_symtab(struct shash *symtab)
130130
WR_CT_COMMIT);
131131
expr_symtab_add_subfield_scoped(symtab, "ct_label.blocked", NULL,
132132
"ct_label[0]", WR_CT_COMMIT);
133+
expr_symtab_add_subfield_scoped(symtab, "ct_label.ecmp_reply_eth", NULL,
134+
"ct_label[32..79]", WR_CT_COMMIT);
135+
expr_symtab_add_subfield_scoped(symtab, "ct_label.ecmp_reply_port", NULL,
136+
"ct_label[80..95]", WR_CT_COMMIT);
133137

134138
expr_symtab_add_field(symtab, "ct_state", MFF_CT_STATE, NULL, false);
135139

northd/ovn-northd.8.xml

Lines changed: 39 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2120,15 +2120,31 @@ icmp6 {
21202120
<p>
21212121
This is to send packets to connection tracker for tracking and
21222122
defragmentation. It contains a priority-0 flow that simply moves traffic
2123-
to the next table. If load balancing rules with virtual IP addresses
2124-
(and ports) are configured in <code>OVN_Northbound</code> database for a
2125-
Gateway router, a priority-100 flow is added for each configured virtual
2126-
IP address <var>VIP</var>. For IPv4 <var>VIPs</var> the flow matches
2127-
<code>ip &amp;&amp; ip4.dst == <var>VIP</var></code>. For IPv6
2128-
<var>VIPs</var>, the flow matches <code>ip &amp;&amp; ip6.dst ==
2129-
<var>VIP</var></code>. The flow uses the action <code>ct_next;</code>
2130-
to send IP packets to the connection tracker for packet de-fragmentation
2131-
and tracking before sending it to the next table.
2123+
to the next table.
2124+
</p>
2125+
2126+
<p>
2127+
If load balancing rules with virtual IP addresses (and ports) are
2128+
configured in <code>OVN_Northbound</code> database for a Gateway router,
2129+
a priority-100 flow is added for each configured virtual IP address
2130+
<var>VIP</var>. For IPv4 <var>VIPs</var> the flow matches <code>ip
2131+
&amp;&amp; ip4.dst == <var>VIP</var></code>. For IPv6 <var>VIPs</var>,
2132+
the flow matches <code>ip &amp;&amp; ip6.dst == <var>VIP</var></code>.
2133+
The flow uses the action <code>ct_next;</code> to send IP packets to the
2134+
connection tracker for packet de-fragmentation and tracking before
2135+
sending it to the next table.
2136+
</p>
2137+
2138+
<p>
2139+
If ECMP routes with symmetric reply are configured in the
2140+
<code>OVN_Northbound</code> database for a gateway router, a priority-100
2141+
flow is added for each router port on which symmetric replies are
2142+
configured. The matching logic for these ports essentially reverses the
2143+
configured logic of the ECMP route. So for instance, a route with a
2144+
destination routing policy will instead match if the source IP address
2145+
matches the static route's prefix. The flow uses the action
2146+
<code>ct_next</code> to send IP packets to the connection tracker for
2147+
packet de-fragmentation and tracking before sending it to the next table.
21322148
</p>
21332149

21342150
<h3>Ingress Table 5: UNSNAT</h3>
@@ -2489,7 +2505,15 @@ output;
24892505
table. This table, instead, is responsible for determine the ECMP
24902506
group id and select a member id within the group based on 5-tuple
24912507
hashing. It stores group id in <code>reg8[0..15]</code> and member id in
2492-
<code>reg8[16..31]</code>.
2508+
<code>reg8[16..31]</code>. This step is skipped if the traffic going
2509+
out the ECMP route is reply traffic, and the ECMP route was configured
2510+
to use symmetric replies. Instead, the stored <code>ct_label</code> value
2511+
is used to choose the destination. The least significant 48 bits of the
2512+
<code>ct_label</code> tell the destination MAC address to which the
2513+
packet should be sent. The next 16 bits tell the logical router port on
2514+
which the packet should be sent. These values in the
2515+
<code>ct_label</code> are set when the initial ingress traffic is
2516+
received over the ECMP route.
24932517
</p>
24942518

24952519
<p>
@@ -2639,6 +2663,11 @@ select(reg8[16..31], <var>MID1</var>, <var>MID2</var>, ...);
26392663
address and <code>reg1</code> as the source protocol address).
26402664
</p>
26412665

2666+
<p>
2667+
This processing is skipped for reply traffic being sent out of an ECMP
2668+
route if the route was configured to use symmetric replies.
2669+
</p>
2670+
26422671
<p>
26432672
This table contains the following logical flows:
26442673
</p>

northd/ovn-northd.c

Lines changed: 108 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -172,16 +172,17 @@ enum ovn_stage {
172172
PIPELINE_STAGE(ROUTER, IN, DEFRAG, 4, "lr_in_defrag") \
173173
PIPELINE_STAGE(ROUTER, IN, UNSNAT, 5, "lr_in_unsnat") \
174174
PIPELINE_STAGE(ROUTER, IN, DNAT, 6, "lr_in_dnat") \
175-
PIPELINE_STAGE(ROUTER, IN, ND_RA_OPTIONS, 7, "lr_in_nd_ra_options") \
176-
PIPELINE_STAGE(ROUTER, IN, ND_RA_RESPONSE, 8, "lr_in_nd_ra_response") \
177-
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 9, "lr_in_ip_routing") \
178-
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING_ECMP, 10, "lr_in_ip_routing_ecmp") \
179-
PIPELINE_STAGE(ROUTER, IN, POLICY, 11, "lr_in_policy") \
180-
PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 12, "lr_in_arp_resolve") \
181-
PIPELINE_STAGE(ROUTER, IN, CHK_PKT_LEN , 13, "lr_in_chk_pkt_len") \
182-
PIPELINE_STAGE(ROUTER, IN, LARGER_PKTS, 14,"lr_in_larger_pkts") \
183-
PIPELINE_STAGE(ROUTER, IN, GW_REDIRECT, 15, "lr_in_gw_redirect") \
184-
PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 16, "lr_in_arp_request") \
175+
PIPELINE_STAGE(ROUTER, IN, ECMP_STATEFUL, 7, "lr_in_ecmp_stateful") \
176+
PIPELINE_STAGE(ROUTER, IN, ND_RA_OPTIONS, 8, "lr_in_nd_ra_options") \
177+
PIPELINE_STAGE(ROUTER, IN, ND_RA_RESPONSE, 9, "lr_in_nd_ra_response") \
178+
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING, 10, "lr_in_ip_routing") \
179+
PIPELINE_STAGE(ROUTER, IN, IP_ROUTING_ECMP, 11, "lr_in_ip_routing_ecmp") \
180+
PIPELINE_STAGE(ROUTER, IN, POLICY, 12, "lr_in_policy") \
181+
PIPELINE_STAGE(ROUTER, IN, ARP_RESOLVE, 13, "lr_in_arp_resolve") \
182+
PIPELINE_STAGE(ROUTER, IN, CHK_PKT_LEN , 14, "lr_in_chk_pkt_len") \
183+
PIPELINE_STAGE(ROUTER, IN, LARGER_PKTS, 15,"lr_in_larger_pkts") \
184+
PIPELINE_STAGE(ROUTER, IN, GW_REDIRECT, 16, "lr_in_gw_redirect") \
185+
PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 17, "lr_in_arp_request") \
185186
\
186187
/* Logical router egress stages. */ \
187188
PIPELINE_STAGE(ROUTER, OUT, UNDNAT, 0, "lr_out_undnat") \
@@ -7418,6 +7419,7 @@ struct parsed_route {
74187419
bool is_src_route;
74197420
uint32_t hash;
74207421
const struct nbrec_logical_router_static_route *route;
7422+
bool ecmp_symmetric_reply;
74217423
};
74227424

74237425
static uint32_t
@@ -7479,6 +7481,8 @@ parsed_routes_add(struct ovs_list *routes,
74797481
"src-ip"));
74807482
pr->hash = route_hash(pr);
74817483
pr->route = route;
7484+
pr->ecmp_symmetric_reply = smap_get_bool(&route->options,
7485+
"ecmp_symmetric_reply", false);
74827486
ovs_list_insert(routes, &pr->list_node);
74837487
return pr;
74847488
}
@@ -7727,26 +7731,102 @@ find_static_route_outport(struct ovn_datapath *od, struct hmap *ports,
77277731
return true;
77287732
}
77297733

7734+
static void
7735+
add_ecmp_symmetric_reply_flows(struct hmap *lflows,
7736+
struct ovn_datapath *od,
7737+
const char *port_ip,
7738+
struct ovn_port *out_port,
7739+
const struct parsed_route *route,
7740+
struct ds *route_match)
7741+
{
7742+
const struct nbrec_logical_router_static_route *st_route = route->route;
7743+
struct ds match = DS_EMPTY_INITIALIZER;
7744+
struct ds actions = DS_EMPTY_INITIALIZER;
7745+
struct ds ecmp_reply = DS_EMPTY_INITIALIZER;
7746+
char *cidr = normalize_v46_prefix(&route->prefix, route->plen);
7747+
7748+
/* If symmetric ECMP replies are enabled, then packets that arrive over
7749+
* an ECMP route need to go through conntrack.
7750+
*/
7751+
ds_put_format(&match, "inport == %s && ip%s.%s == %s",
7752+
out_port->json_key,
7753+
route->prefix.family == AF_INET ? "4" : "6",
7754+
route->is_src_route ? "dst" : "src",
7755+
cidr);
7756+
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, 100,
7757+
ds_cstr(&match), "ct_next;",
7758+
&st_route->header_);
7759+
7760+
/* And packets that go out over an ECMP route need conntrack */
7761+
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, 100,
7762+
ds_cstr(route_match), "ct_next;",
7763+
&st_route->header_);
7764+
7765+
/* Save src eth and inport in ct_label for packets that arrive over
7766+
* an ECMP route.
7767+
*
7768+
* NOTE: we purposely are not clearing match before this
7769+
* ds_put_cstr() call. The previous contents are needed.
7770+
*/
7771+
ds_put_cstr(&match, " && (ct.new && !ct.est)");
7772+
7773+
ds_put_format(&actions, "ct_commit { ct_label.ecmp_reply_eth = eth.src;"
7774+
" ct_label.ecmp_reply_port = %" PRId64 ";}; next;",
7775+
out_port->sb->tunnel_key);
7776+
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_ECMP_STATEFUL, 100,
7777+
ds_cstr(&match), ds_cstr(&actions),
7778+
&st_route->header_);
7779+
7780+
/* Bypass ECMP selection if we already have ct_label information
7781+
* for where to route the packet.
7782+
*/
7783+
ds_put_format(&ecmp_reply, "ct.rpl && ct_label.ecmp_reply_port == %"
7784+
PRId64, out_port->sb->tunnel_key);
7785+
ds_clear(&match);
7786+
ds_put_format(&match, "%s && %s", ds_cstr(&ecmp_reply),
7787+
ds_cstr(route_match));
7788+
ds_clear(&actions);
7789+
ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; "
7790+
"eth.src = %s; %sreg1 = %s; outport = %s; next;",
7791+
out_port->lrp_networks.ea_s,
7792+
route->prefix.family == AF_INET ? "" : "xx",
7793+
port_ip, out_port->json_key);
7794+
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_ROUTING, 100,
7795+
ds_cstr(&match), ds_cstr(&actions),
7796+
&st_route->header_);
7797+
7798+
/* Egress reply traffic for symmetric ECMP routes skips router policies. */
7799+
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_POLICY, 65535,
7800+
ds_cstr(&ecmp_reply), "next;",
7801+
&st_route->header_);
7802+
7803+
ds_clear(&actions);
7804+
ds_put_cstr(&actions, "eth.dst = ct_label.ecmp_reply_eth; next;");
7805+
ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_ARP_RESOLVE,
7806+
200, ds_cstr(&ecmp_reply),
7807+
ds_cstr(&actions), &st_route->header_);
7808+
}
7809+
77307810
static void
77317811
build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od,
77327812
struct hmap *ports, struct ecmp_groups_node *eg)
77337813

77347814
{
77357815
bool is_ipv4 = (eg->prefix.family == AF_INET);
7736-
struct ds match = DS_EMPTY_INITIALIZER;
77377816
uint16_t priority;
7817+
struct ecmp_route_list_node *er;
7818+
struct ds route_match = DS_EMPTY_INITIALIZER;
77387819

77397820
char *prefix_s = build_route_prefix_s(&eg->prefix, eg->plen);
77407821
build_route_match(NULL, prefix_s, eg->plen, eg->is_src_route, is_ipv4,
7741-
&match, &priority);
7822+
&route_match, &priority);
77427823
free(prefix_s);
77437824

77447825
struct ds actions = DS_EMPTY_INITIALIZER;
77457826
ds_put_format(&actions, "ip.ttl--; flags.loopback = 1; %s = %"PRIu16
77467827
"; %s = select(", REG_ECMP_GROUP_ID, eg->id,
77477828
REG_ECMP_MEMBER_ID);
77487829

7749-
struct ecmp_route_list_node *er;
77507830
bool is_first = true;
77517831
LIST_FOR_EACH (er, list_node, &eg->route_list) {
77527832
if (is_first) {
@@ -7760,11 +7840,12 @@ build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od,
77607840
ds_put_cstr(&actions, ");");
77617841

77627842
ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, priority,
7763-
ds_cstr(&match), ds_cstr(&actions));
7843+
ds_cstr(&route_match), ds_cstr(&actions));
77647844

77657845
/* Add per member flow */
7846+
struct ds match = DS_EMPTY_INITIALIZER;
7847+
struct sset visited_ports = SSET_INITIALIZER(&visited_ports);
77667848
LIST_FOR_EACH (er, list_node, &eg->route_list) {
7767-
77687849
const struct parsed_route *route_ = er->route;
77697850
const struct nbrec_logical_router_static_route *route = route_->route;
77707851
/* Find the outgoing port. */
@@ -7774,6 +7855,15 @@ build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od,
77747855
&out_port)) {
77757856
continue;
77767857
}
7858+
/* Symmetric ECMP reply is only usable on gateway routers.
7859+
* It is NOT usable on distributed routers with a gateway port.
7860+
*/
7861+
if (smap_get(&od->nbr->options, "chassis") &&
7862+
route_->ecmp_symmetric_reply && sset_add(&visited_ports,
7863+
out_port->key)) {
7864+
add_ecmp_symmetric_reply_flows(lflows, od, lrp_addr_s, out_port,
7865+
route_, &route_match);
7866+
}
77777867
ds_clear(&match);
77787868
ds_put_format(&match, REG_ECMP_GROUP_ID" == %"PRIu16" && "
77797869
REG_ECMP_MEMBER_ID" == %"PRIu16,
@@ -7794,7 +7884,9 @@ build_ecmp_route_flow(struct hmap *lflows, struct ovn_datapath *od,
77947884
ds_cstr(&match), ds_cstr(&actions),
77957885
&route->header_);
77967886
}
7887+
sset_destroy(&visited_ports);
77977888
ds_destroy(&match);
7889+
ds_destroy(&route_match);
77987890
ds_destroy(&actions);
77997891
}
78007892

@@ -9078,6 +9170,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
90789170
ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;");
90799171
ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 0, "1", "next;");
90809172
ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 0, "1", "next;");
9173+
ovn_lflow_add(lflows, od, S_ROUTER_IN_ECMP_STATEFUL, 0, "1", "next;");
90819174

90829175
/* Send the IPv6 NS packets to next table. When ovn-controller
90839176
* generates IPv6 NS (for the action - nd_ns{}), the injected

ovn-architecture.7.xml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1210,11 +1210,12 @@
12101210
<dd>
12111211
Fields that denote the connection tracking zones for routers. These
12121212
values only have local significance and are not meaningful between
1213-
chassis. OVN stores the zone information for DNATting in Open vSwitch
1213+
chassis. OVN stores the zone information for north to south traffic
1214+
(for DNATting or ECMP symmetric replies) in Open vSwitch
12141215
<!-- Keep the following in sync with MFF_LOG_DNAT_ZONE and
12151216
MFF_LOG_SNAT_ZONE in ovn/lib/logical-fields.h. -->
1216-
extension register number 11 and zone information for SNATing in
1217-
Open vSwitch extension register number 12.
1217+
extension register number 11 and zone information for south to north
1218+
traffic (for SNATing) in Open vSwitch extension register number 12.
12181219
</dd>
12191220

12201221
<dt>logical flow flags</dt>

ovn-nb.ovsschema

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "OVN_Northbound",
3-
"version": "5.24.0",
4-
"cksum": "1092394564 25961",
3+
"version": "5.25.0",
4+
"cksum": "1354137211 26116",
55
"tables": {
66
"NB_Global": {
77
"columns": {
@@ -365,6 +365,9 @@
365365
"min": 0, "max": 1}},
366366
"nexthop": {"type": "string"},
367367
"output_port": {"type": {"key": "string", "min": 0, "max": 1}},
368+
"options": {
369+
"type": {"key": "string", "value": "string",
370+
"min": 0, "max": "unlimited"}},
368371
"external_ids": {
369372
"type": {"key": "string", "value": "string",
370373
"min": 0, "max": "unlimited"}}},

ovn-nb.xml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2481,6 +2481,22 @@
24812481
</column>
24822482
</group>
24832483

2484+
<group title="Common options">
2485+
<column name="options">
2486+
This column provides general key/value settings. The supported
2487+
options are described individually below.
2488+
</column>
2489+
2490+
<column name="options" key="ecmp_symmetric_reply">
2491+
It true, then new traffic that arrives over this route will have
2492+
its reply traffic bypass ECMP route selection and will be sent out
2493+
this route instead. Note that this option overrides any rules set
2494+
in the <ref table="Logical_Router_policy" /> table. This option
2495+
only works on gateway routers (routers that have
2496+
<ref column="options" key="chassis" table="Logical_Router" /> set).
2497+
</column>
2498+
</group>
2499+
24842500
</table>
24852501

24862502
<table name="Logical_Router_Policy" title="Logical router policies">

0 commit comments

Comments
 (0)