db/config.cc: increment components_memory_reclaim_threshold config default #18611

lkshminarayanan · 2024-05-10T09:04:01Z

Incremented the components_memory_reclaim_threshold config's default value to 0.8 as the previous value was too strict and caused unnecessary eviction in otherwise healthy clusters.

Fixes #18607

…fault Incremented the components_memory_reclaim_threshold config's default value to 0.8 as the previous value was too strict and caused unnecessary eviction in otherwise healthy clusters. Fixes scylladb#18607 Signed-off-by: Lakshmi Narayanan Sreethar <[email protected]>

scylladb-promoter · 2024-05-10T11:43:14Z

🟢 CI State: SUCCESS

✅ - Build
✅ - Unit Tests Custom
The following new/updated tests ran 100 times for each mode:
🔹 boost/sstable_datafile_test
✅ - Container Test
✅ - dtest
✅ - dtest with topology changes
✅ - Unit Tests

Build Details:

Duration: 2 hr 38 min
Builder: spider3.cloudius-systems.com

michoecho

@denesb No, what are you doing. 0.8 effectively disables the entire mechanism.

The system gradually adjusts memtable flushing speed to keep memtable memory usage under 0.5 (and stops accepting writes at the extreme), and doesn't even start flushing until it reaches ~~0.25~~ 0.15.

Bloom filter memory usage isn't going to reach 0.8 because the system is all but guaranteed to OOM at ~~0.75~~ 0.85, and that's even without counting all the non-cache, non-memtable memory usage.

If you want to prevent OOM crashes with this feature, the threshold must be way under ~~0.85~~ (with enough margin to cover all non-LSA, non-bloom memory usage). If you want to guarantee that, then it must be somewhere under (0.5 - margin).

For example, take the last case of bloom explosion from a few days ago (https://github.com/scylladb/scylla-enterprise/issues/4156). The affected node started choking on bad_allocs at bloom fraction of 0.6, and crashed (due to bad_alloc-induced flush failure) at 0.65.

Edit: I misremembered the defaults — we start flushing at 0.15, not at 0.25. Which means that 0.8 might be enough to prevent some OOMs, but that's cutting it very close.

denesb · 2024-05-13T05:38:35Z

If you want to prevent OOM crashes with this feature, the threshold must be way under 0.75 (with enough margin to cover all non-LSA, non-bloom memory usage). If you want to guarantee that, then it must be somewhere under (0.5 - margin).

For example, take the last case of bloom explosion from a few days ago (scylladb/scylla-enterprise#4156). The affected node started choking on bad_allocs at bloom fraction of 0.6, and crashed (due to bad_alloc-induced flush failure) at 0.65.

0.6 is way too strict. In another customer case, the cluster is healthy and steady at 0.6 BF memory usage ratio and forcing it below makes IO and latencies jump and results in escalation. So we have to use the max() of all use-cases as the default and lower on deployments where the default is already problematic.

michoecho · 2024-05-13T05:48:17Z

0.6 is way too strict. In another customer case, the cluster is healthy and steady at 0.6 BF memory usage ratio

Steady at 0.6? Which customer case was that?

denesb · 2024-05-13T05:53:49Z

0.6 is way too strict. In another customer case, the cluster is healthy and steady at 0.6 BF memory usage ratio

Steady at 0.6? Which customer case was that?

https://github.com/scylladb/scylla-enterprise/issues/4183
And let's stop discussing customer issues here, before we accidentally name-drop one of them. Although I suppose these cases are very much relevant here, still customer issues should not be discussed in the OSS repository.

mykaul · 2024-05-21T10:51:41Z

@avikivity - per our discussion on Sunday - please review.

avikivity · 2024-05-21T11:39:55Z

The goal of the feature was to prevent OOM. If we use the max() of all clusters, it will never prevent OOM since different clusters have other non-LSA components (prepared statements, cached queriers, running ops, background writes, repair with its large buffers, group 0 holding all the tablets metadata).

We can increase it to 0.2, but not beyond.

denesb · 2024-05-21T11:42:19Z

The goal of the feature was to prevent OOM. If we use the max() of all clusters, it will never prevent OOM since different clusters have other non-LSA components (prepared statements, cached queriers, running ops, background writes, repair with its large buffers, group 0 holding all the tablets metadata).

We can increase it to 0.2, but not beyond.

If we make the default too conservative, field is just going to disable it on all clusters and then we also aren't preventing OOMs.
I have sent an email to rnd-int to discuss this in its wider context, let's continue the discussion there.

mykaul · 2024-05-23T11:12:42Z

@avikivity - please help move this forward.

mykaul · 2024-05-26T11:19:17Z

@avikivity - please look at this.

avikivity · 2024-05-26T11:20:54Z

0.8 makes the protection meaningless.

denesb · 2024-05-27T07:57:43Z

Alright, so let's make it 0.2.
The current default is 0.1 and I know of a single cluster where this caused problems. That said, I think field disabled it on every cluster they manage, after the aforementioned incident.

mykaul · 2024-05-27T10:56:20Z

Alright, so let's make it 0.2. The current default is 0.1 and I know of a single cluster where this caused problems. That said, I think field disabled it on every cluster they manage, after the aforementioned incident.

Do you need to remove your approval until the value is correctly set?

lkshminarayanan requested a review from denesb May 10, 2024 09:04

lkshminarayanan self-assigned this May 10, 2024

lkshminarayanan added backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed labels May 10, 2024

github-actions bot added status/regression area/memory footprint backport/6.0 labels May 10, 2024

denesb approved these changes May 10, 2024

View reviewed changes

michoecho suggested changes May 10, 2024

View reviewed changes

mykaul added this to the 5.4.7 milestone May 13, 2024

denesb self-requested a review May 27, 2024 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

db/config.cc: increment components_memory_reclaim_threshold config default #18611

db/config.cc: increment components_memory_reclaim_threshold config default #18611

lkshminarayanan commented May 10, 2024

scylladb-promoter commented May 10, 2024

michoecho left a comment •

edited

denesb commented May 13, 2024 •

edited

michoecho commented May 13, 2024

denesb commented May 13, 2024

mykaul commented May 21, 2024

avikivity commented May 21, 2024

denesb commented May 21, 2024

mykaul commented May 23, 2024

mykaul commented May 26, 2024

avikivity commented May 26, 2024

denesb commented May 27, 2024

mykaul commented May 27, 2024

db/config.cc: increment components_memory_reclaim_threshold config default #18611

Are you sure you want to change the base?

db/config.cc: increment components_memory_reclaim_threshold config default #18611

Conversation

lkshminarayanan commented May 10, 2024

scylladb-promoter commented May 10, 2024

🟢 CI State: SUCCESS

Build Details:

michoecho left a comment • edited

Choose a reason for hiding this comment

denesb commented May 13, 2024 • edited

michoecho commented May 13, 2024

denesb commented May 13, 2024

mykaul commented May 21, 2024

avikivity commented May 21, 2024

denesb commented May 21, 2024

mykaul commented May 23, 2024

mykaul commented May 26, 2024

avikivity commented May 26, 2024

denesb commented May 27, 2024

mykaul commented May 27, 2024

michoecho left a comment •

edited

denesb commented May 13, 2024 •

edited