Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netty executor queue size is infinite resulting in GC pressure / OOM #1012

Open
nukemberg opened this issue Mar 11, 2022 · 4 comments
Open

Comments

@nukemberg
Copy link
Contributor

Describe the bug
In cases of overload with many clients or clients which mishandle backpressure netty executor queue can grow without bounds consuming massive amounts of memory. This leads to the server experiencing GC pressure further aggravating the problem and ultimately OOM.

To Reproduce
Run Riemann with many clients or clients that have a very large number of outstanding requests and slow down streams so that a backlog is created.

Expected behavior
Netty executor queue size should be limited and excess messages should be dropped, in which case a special "overload" event should be injected into the streams. Alternatively (perhaps configurable?) TCP backpressure should be applied as last resort.

@sanel
Copy link
Contributor

sanel commented May 12, 2022

Try setting property -Dio.netty.eventLoopThreads=N, where N > 1 at Riemann startup. See [1].

[1] https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/MultithreadEventLoopGroup.java#L41

@sanel
Copy link
Contributor

sanel commented May 12, 2022

I misread your message, so my comment above might not solve your issue. Try toying with io.netty.recycler.maxCapacity.default as well. netty has builtin recycler to reduce GC pressure, and the value should be > 256 (default is 262144).

Also, this will not solve your problems if you have badly designed streams that will indefinitely aggregate values. Small reproducible example and full stacktrace would be helpful.

@Avishai-Ishshalom-Forter
Copy link

Avishai-Ishshalom-Forter commented May 25, 2022

I should have been clearer: the executor in question is the so called riemann netty event-executor which is where streams are handled and not Netty threads, defined here. The executor is a io.netty.util.concurrent.SingleThreadEventExecutorio.netty.util.concurrent.SingleThreadEventExecutor with default DEFAULT_MAX_PENDING_EXECUTOR_TASKS of int max value, see here - this can be changed by the io.netty.eventexecutor.maxPendingTasks system property. However I believe that this is a bad default for a system like Riemann and also should be set via the Riemann config and not through an obscure Netty system property unknown to most users.

@nukemberg
Copy link
Contributor Author

nukemberg commented Jun 10, 2022

So, it turns out that Riemann is using io.netty.util.concurrent.DefaultEventExecutorGroup which extends io.netty.util.concurrent.MultithreadEventExecutorGroup; every executor in the executor group has its own queue with size io.netty.eventexecutor.maxPendingTasks and Riemann uses (.. Runtime getRuntime availableProcessors) threads, so total queue size is `cpus * maxPendingTasks.

Note that the event executor group chooses an executor queue in a round robin fashion and each channel/socket is bound to an executor. This creates another performance problem as queueing in multiple queues is prone to higher latency and lower throughput (this is a known result in queueing theory - short explanation is that queue lengths have some variation so "evenly distributed" load will queue already overloaded queues). I assume this is done to preserve event ordering as events from the same client will be enqueued and handled in order, but this guarantee is not documented or promised anywhere in Riemann.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants