-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Replies: 1 comment · 11 replies
-
Kudos on providing all the info, probably the first time I see almost all the data needed to help with the issue :) A few comments:
I will provide more hints once you be able to fix these issues. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I am sorry, I won't be able to debug grafana metrics but they are seem off - it's unlikely to have latencies reaching seconds. |
Beta Was this translation helpful? Give feedback.
All reactions
-
I've moved on the same configuration on production yesterday (the only difference is 4Gb maxmemory and 12 threads), and there are similar results now. The main concern is script latencies, other components are working so far. Now I'll provide info from production server.
The highest value I've seen was 420. Haven't looked for it on purpose, though.
Here is INFO ALL
And here is SCRIPT LATENCY
|
Beta Was this translation helpful? Give feedback.
All reactions
-
Created graph for this log, the maximum for the whole time is 1530 |
Beta Was this translation helpful? Give feedback.
All reactions
-
Ok, latencies are really high. You can futher increase Seems that your lua scripts touch multiple hashtags that are spread across multiple threads but I do not know why and whether this can be fixed. |
Beta Was this translation helpful? Give feedback.
All reactions
-
As I see from metrics, there were only 3 spikes where it was heavily increasing, and now it's much more stable (without doing anything) I will try increasing
As I said, the only hashtag (keys with
OK, thank you for your help. The only frustrating moment is that the most specific recommendation given in your article and in sidekiq wiki ( |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Firstly, thanks for the great project!
We're now trying to move to dragonfly from Redis (KeyDB to be more precise). We use redis for 4 different purposes, one of which is Sidekiq. We've moved to dragonfly in all of our situations, and everything looked great, improvement in resources was significant (12 CPUs on KeyDB vs 3 CPUs on Dragonfly). But then we've faced some problems in Sidekiq: every some time Dragonfly started drain memory (showing OOMs in lua scripts, stopping sending jobs to Sidekiq, then rejecting connections at all) and only restart could fix that. Dashboard of this situation look like this:
INFO in this situation at 17:43
I should note, that we are using
sidekiq-unique-jobs
to ensure jobs uniqueness andsidekiq-alive
for liveness probes.I started to investigate this situation, and it looks like that the root cause is our quantity of queues. Thus, we have about 20 queues, which are loaded unevenly (about 5 queues have more than 80% of load) plus we have more than 190 sidekiq-alive queues (one for each sidekiq instance) that only used to process 1 job in some time.
So, when the problem begins, I see continuous grow in scheduled and enqueued jobs, which are mostly SidekiqAlive jobs as I can see from our Sidekiq dashboard:
Then, Dragonfly starts to put these warnings to log:
As I understand, somewhere here Sidekiq workers begin to stop receiving new jobs and therefore queues and memory consumption continue to grow leading to OOMs:
first in scripts
and then on other clients commands and connections.
My best guess now is that something is happening when the number of queues grows (as sidekiq-alive-queues could appear and dissappear on scaling) and then everything is messed up.
Our Dragonfly configuration is like recommended in your docs (we tried to increase maxmemory and threads, but it randomly affects this case and ultimately leads to the same thing):
Now we had to rollback to KeyDB for Sidekiq (as we didn't have this problems on it) and I am here to share our situation and looking for advice how to handle our number of queues and uneven load on this queues. Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions