Skip to content

Commit 04e5bff

Browse files
committed
Add Q&A for knobs
1 parent 3d06dcd commit 04e5bff

1 file changed

Lines changed: 33 additions & 0 deletions

File tree

source/client-backpressure/client-backpressure.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -391,6 +391,39 @@ Additionally, both `retryReads` and `retryWrites` are enabled by default, so for
391391
retried. This approach also prevents accidentally retrying a read command when only `retryWrites` is enabled, or
392392
retrying a write command when only `retryReads` is enabled.
393393

394+
### Why make `maxAdaptiveRetries` configurable?
395+
396+
Modelling and the underpinning theory for backpressure shows that the n-retries approach (retry up to N times on
397+
overload errors without a token bucket) can introduce retry storms as overload increases. However, the specifics of the
398+
workload and cluster serving that workload significantly impacts the threshold at which retry volume becomes an
399+
additional burden rather than a throughput improvement. Some applications and clusters may be very tolerant of many
400+
additional retries, while others may want to break out of the loop much earlier.
401+
402+
The selection of 2 as a default attempts to broadly pick a sensible default for most users that will on average be a
403+
benefit rather than a negative during overload. However, savvy users, the users expected to be most affected by overload
404+
and have the most insight into the specifics of their workload and cluster, will likely find that tweaking this value on
405+
a per-workload basis produces better results. Additionally, there are situations where disabling overload retries
406+
entirely is optimal. Without a knob, those situations will cause users to either have a strictly worse experience with a
407+
new driver, or force them to downgrade to an older driver to avoid the issue. These are two strong motivations to add a
408+
knob for `maxAdaptiveRetries`.
409+
410+
### Why make `enableOverloadRetargeting` configurable?
411+
412+
The current contract we've made with users utilizing `primaryPreferred` is that reads will only go to a secondary if the
413+
primary is unavailable. The documentation does not explicitly define unavailable, but in practice that means the primary
414+
is unselectable. Overload retargeting makes the primary unselectable for a retry operation if it returned an overload
415+
error on a previous attempt. This materially changes how often secondary reads occur. Since secondary reads can result
416+
in stale data, enabling overload retargeting increases the chance that users of `primaryPreferred` will get stale data
417+
when they did not previously. This is a semantic change, and so retargeting is disabled by default, with a knob to
418+
enable it.
419+
420+
Overload retargeting significantly increases availability during overload, but it does increase the risk of getting
421+
stale data when used with `primaryPreferred`. Users of `primaryPreferred` may widely end up preferring that behavior. If
422+
that is the case, overload retargeting may be enabled by default in the future.
423+
424+
`secondaryPreferred` does not have this same staleness issue, but it still materially changes what the preference means
425+
from "almost always secondary" to "sometimes primary".
426+
394427
## Changelog
395428

396429
- 2026-03-30: Introduce phase 1 support without token buckets.

0 commit comments

Comments
 (0)