bloom-filter: default value for components_memory_reclaim_threshold is too strict #18607

denesb · 2024-05-10T06:38:04Z

components_memory_reclaim_threshold controls when we start evicting bloom filters from memory. It defaults to 0.1, which comes down to 10% of the shard's memory. It looks like this is too strict and causes unnecessary bloom filter eviction in otherwise healthy clusters. One cluster which was impacted by this, had a BF/memory ratio of 0.6. The eviction mechanism was introduced to avert OOM disasters, not to enforce lean bloom filters and so the default should be adjusted accordingly and increased to as high as 0.8 or even maybe 0.9. The idea is that by default it should only kick in when OOM is iminent, it should not intervene on clusters which have high BF memory consumption but are otherwise fine.

The text was updated successfully, but these errors were encountered:

…fault Incremented the components_memory_reclaim_threshold config's default value to 0.8 as the previous value was too strict and caused unnecessary eviction in otherwise healthy clusters. Fixes scylladb#18607 Signed-off-by: Lakshmi Narayanan Sreethar <[email protected]>

vladzcloudius · 2024-05-13T21:27:00Z

According to what @avikivity wrote before the OOM is "imminent" when it reaches 0.5.
However we know for a fact this is not true universally - we saw nodes surviving 0.6 and more just fine.
OOM happens not because BF occupies this or that amount of RAM but because the allocation fails.

BF amount can contribute but this is not the root cause - it's just a trigger together with many other possible triggers for the OOM.

IMO we should not enable this feature by default just like we don't enable compaction I/O bandwidth throttling by default.

Instead we should invest in the actual fixing of the problem:

Make sure BF size always correct: we know exactly how many partitions sstable has right before we seal it. Using estimations like we do today is a performance optimization and it should be always discarded when it gets in a way of a correctness. Instead we keep on "fighting" for that heuristics piling up spaghetti code for no good reason whatsoever.
If we want to restrict the BF memory size we should implement proper caching (we have a few classes in the code that can make it pretty straight forward, e.g. loading_cache.hh).
2.1) If we do that we should add a configurable to control the maximum amount of RAM consumed by BF cache.

fyi @denesb

denesb added status/regression area/memory footprint backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed backport/6.0 labels May 10, 2024

denesb assigned denesb and lkshminarayanan May 10, 2024

lkshminarayanan linked a pull request May 10, 2024 that will close this issue

db/config.cc: increment components_memory_reclaim_threshold config default #18611

Open

mykaul added this to the 5.4.7 milestone May 10, 2024

denesb mentioned this issue May 10, 2024

Bloom Filter Epic #17972

Open

denesb modified the milestones: 5.4.7, 6.1 May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bloom-filter: default value for components_memory_reclaim_threshold is too strict #18607

bloom-filter: default value for components_memory_reclaim_threshold is too strict #18607

denesb commented May 10, 2024

vladzcloudius commented May 13, 2024 •

edited

bloom-filter: default value for components_memory_reclaim_threshold is too strict #18607

bloom-filter: default value for components_memory_reclaim_threshold is too strict #18607

Comments

denesb commented May 10, 2024

vladzcloudius commented May 13, 2024 • edited

vladzcloudius commented May 13, 2024 •

edited