Skip to content

Commit 7c35270

Browse files
authored
Merge pull request redpanda-data#3744 from jcsp/kip-308-rfc
docs/rfcs: create 20220207_connections_per_ip.md
2 parents 083a31e + 395e8c1 commit 7c35270

File tree

1 file changed

+212
-0
lines changed

1 file changed

+212
-0
lines changed
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
- Feature Name: Client conection limits
2+
- Status: implemented
3+
- Start Date: 2022-02-07
4+
- Authors: John Spray
5+
- Issue:
6+
7+
# Executive Summary
8+
9+
Add per-client state to track and limit the number of connections open.
10+
11+
Similar to KIP-308, adapted to Redpanda's thread per core model.
12+
13+
## What is being proposed
14+
15+
New configuration properties:
16+
- kafka_connections_max (optional integer, unset by default)
17+
- kafka_connections_max_per_ip (optional integer, unset by default)
18+
- kafka_connections_max_overrides (list of strings, default empty)
19+
20+
New prometheus metric:
21+
- vectorized_kafka_rpc_connections_rejected
22+
23+
When a connection is accepted which exceeds one of the configured
24+
bounds, it is dropped before reading or writing anything to the socket.
25+
26+
## Motivation
27+
28+
* Avoid hitting system resource limits (e.g. open file handles)
29+
related to port counts
30+
* Prevent rogue clients from consuming unbounded memory (we allocate
31+
some userspace buffer space to all connections, and the kernel
32+
allocates some buffers too).
33+
* Help users notice if a client is acting strangely: an application
34+
author opening large numbers of connections will tend to notice
35+
themselves when they hit the limit, rather than the operator
36+
of the cluster noticing when the cluster performance is impacted.
37+
* Provide a client allow-listing/deny-listing mechanism (e.g.
38+
kafka_max.connections_per_ip is set to zero and
39+
kafka_max_connections_per_ip_overrides specifies
40+
permitted clients)
41+
* Continuity of experience for Kafka users accustomed to
42+
max.connections.per.ip
43+
44+
## How
45+
46+
### Accepting a connection
47+
48+
If both configuration properties are unset, then no action is taken: this
49+
preserves the existing unlimited behaviour.
50+
51+
A sharded service storing per-client tokens for open connections, clients are
52+
mapped to shards by hash of their IP.
53+
54+
In kafka::protocol::apply, acquire a token at start and relinquish it at end,
55+
probably using a RAII structure, although this function already provides a
56+
convenient localized place to take+release a resource explicitly.
57+
58+
On first connection from a client, the shard servicing the connection makes a
59+
cross-shard call to the core that holds state for this client IP hash. This
60+
initializes the client's state:
61+
62+
```c++
63+
struct client_conn_quota {
64+
// The max per connection. This is set by applying
65+
// the combination of per_ip and per_ip_override settings.
66+
uint32_t total;
67+
68+
// How many connections are currently open. This may be at
69+
// most `total`, although it may exceed it transiently if
70+
// configuration properties are decreased during operation.
71+
uint32_t in_use;
72+
}
73+
```
74+
75+
The connection quota state is stored in a absl::btree_map: since the overall
76+
number of unique clients maybe high (so a flat store is risky), but the entries
77+
are small (so a std::map would generate a lot of tiny allocations).
78+
79+
### Optimization: caching tokens
80+
81+
In the case where the client is not close to exceeding its limits, we can avoid
82+
the overhead of a cross-core call by caching some per-client tokens on each
83+
shard.
84+
85+
This cache can be pre-primed on the first client connection, if the total
86+
tokens for the client is greater than the core count: that way, subsequent
87+
connections from the same client will usually be accepted without the need for
88+
a cross-core call.
89+
90+
### Handling a configuration change
91+
92+
Changes to the configuration properties affect all outstanding
93+
client_conn_quota objects: the `total` attribute must be recalculated.
94+
95+
If `total` is decreased, then it is also necessary to call out to all other
96+
shards and revoke any cached tokens that are held in excess of the new limit.
97+
98+
## Impact
99+
100+
When this feature is enabled, a per-connection latency overhead is added, as
101+
the shard handling the connection must dispatch a call to the shard that owns
102+
the client IP state. A cross core call can take up to a few microseconds,
103+
although the impact on P99 latency can be worse if the target core is very busy.
104+
105+
On a system where the connection count limit is significantly larger than the
106+
core count, this overhead will be avoided in most cases, as each core will
107+
store some locally cached tokens to allow them to accept connections.
108+
109+
110+
# Guide-level explanation
111+
112+
## How do we teach this?
113+
114+
* Presume reader knows what a TCP connection is
115+
* Explain how Kafka clients map to TCP connections: that a producer or consumer
116+
usually opens a connection and uses it over a long period of time, but
117+
a program that constructs many Kafka clients in parallel might open more
118+
connections. The relationship between clients and connections is not 1:1,
119+
as clients may e.g. be both a producer and a consumer, and the client may
120+
open additonal connections for reading metadata.
121+
* Explain why it's bad to have an unbounded number of connections from a
122+
client:
123+
* Each connection has some memory overhead that might exhaust redpanda memory
124+
* Each connection counts against OS-level limits (`ulimit`)
125+
* It's probably not what the user intended for efficient operation, as the
126+
Kafka protocol generally re-uses a smaller number of connections.
127+
* Introduce the new configuration properties, including the syntax of the
128+
`overrides` property
129+
* Mention the caveat that decreasing the limits does not terminate any
130+
currently open connections.
131+
132+
## Interaction with other features
133+
134+
* Dependency on centralized config for live updates
135+
136+
## Drawbacks
137+
138+
### Latency overhead on first message in a new connection:
139+
140+
Mitigating factors:
141+
* Sensible workloads generally operate on established connections
142+
* No overhead if feature is not enabled.
143+
144+
## Alternatives Considered
145+
146+
### Do nothing
147+
148+
An application may exhaust server resources by opening a very large number of
149+
connections (e.g. by instantiating many Kafka clients). This can be mitigated
150+
in environments where the application design is tightly controlled and the
151+
server cluster is sized to match, but in real-life environments this is
152+
unlikely to be the case & it is relatively easy for a naively-written application
153+
to open unbounded numbers of Kafka client connections. For example, consider
154+
a web service that opens a Kafka connection for each request it serves, and
155+
does not have a limit on its own concurrent request count.
156+
157+
### Total connection count, rather than per-client-IP
158+
159+
To protect redpanda from resource exhaustion, it is not necessary to
160+
discriminate between clients: we could just have a single overall connection
161+
count.
162+
163+
This prevents redpanda from having resources exhausted, but does not help
164+
preserve availability of the system: one misbehaving server can crowd out
165+
other clients.
166+
167+
### Shard-local limits
168+
169+
We could avoid some complexity and latency by setting a per-core connection
170+
limit, rather than a total connection limit across all cores. This would
171+
enable all cores to apply their limits using local state only.
172+
173+
This is reasonable at small core counts, but tends to fall down at larger core
174+
counts: consider a 128 core system, where a per-core limit to tolerate
175+
reasonable applications (e.g. 10 connections) would result in a total limit of
176+
1280, much higher than the user wanted.
177+
178+
### Per-user limits
179+
180+
In many environments, the client IP is not a meaningful discriminator between
181+
workloads -- workloads are more likely to be identifiable by their service
182+
account username.
183+
184+
Strictly speaking, it is not possible to limit connections by username, because
185+
we have to accept a connection and receive some bytes to learn the username, but
186+
it could still be useful to drop connections immediately after authentication
187+
if a user has too many connections already.
188+
189+
Adding per-user limits in future is not ruled out, but the scope of this RFC is
190+
limited to the more basic IP-based limits, to get parity with KIP-308.
191+
192+
### Kernel-space (eBPF/BPF/iptables) enforcement
193+
194+
Where the connection count for a particular source IP has been exhausted, and/or
195+
was set to zero, we could more efficiently drop connections by configuring the
196+
kernel to drop them for us, rather than doing it in userspace. This kind of
197+
efficiency is important in filtering systems that guard against DDoS situations.
198+
199+
The connection count limits in this RFC are primarily meant to protect against
200+
misbehaving (but benign) applications rather than active attack, so the overhead
201+
of enforcing in userspace is acceptable. Avoiding kernel mechanisms also helps
202+
with portability (including to older enterprise linux distros) & potential
203+
permissions issues in locked-down environments that might forbid userspace
204+
processes from using some kernel space mechanisms.
205+
206+
All that said, extending our userspace enforcement with lower level mechanisms
207+
in future could be a worthwhile optimization. In Kubernetes environments,
208+
we would need to consider whether to implement this type of enforcement within
209+
Redpanda pods, or within the Kubernetes network stack where external IPs are
210+
exposed.
211+
212+

0 commit comments

Comments
 (0)