Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query fewer analytics at once #1228

Open
matthew-white opened this issue Oct 16, 2024 · 0 comments
Open

Query fewer analytics at once #1228

matthew-white opened this issue Oct 16, 2024 · 0 comments
Labels
refactor Improves code without altering behavior testing Integration tests, unit tests

Comments

@matthew-white
Copy link
Member

matthew-white commented Oct 16, 2024

If usage reporting is enabled, Backend will send a sizable number of metrics in its usage report (almost 100). Some of the database queries behind those metrics touch every row in the largest tables (e.g., the audits, submissions, and client_audits tables), so those queries could take longer. Currently, we query all metrics in parallel. Especially as we continue to add metrics, it seems possible that the number of concurrent queries will lead to performance problems. The usage report is run at 3 a.m. UTC, but in some time zones, user activity is expected at that time.

In practice, I don't see much evidence that there have been performance problems. Looking at analytics from the past 31 days, the vast majority of usage reports were received within a minute of 3 a.m. A small number of usage reports were received after 1 minute past 3 a.m., including some after 5 minutes and one after 40 minutes (!). There are multiple variables that could affect the time of submission receipt, so those numbers don't necessarily speak to the performance of the analytics queries, though that's a possibility.

Side note: I'm actually confused about how a usage report could be successful after taking more than 1 minute, much less 5 minutes. I'm pretty sure that the number of analytics queries exceeds the size of the database connection pool. Doesn't Slonik time out fairly quickly if it can't acquire a database connection? Why isn't it timing out these queries? We've seen that transactions use a single database connection even if they make multiple queries, but I don't think that the analytic queries are run in a transaction. Maybe most of the queries are fast enough that it's not a problem.

If query timeout were an issue, I'd be concerned that we're not receiving analytics from large servers where the queries could take a long time. I see large servers in our analytics, so I don't have reason to believe that's happening. But like I said, I'm confused about why query timeout doesn't seem to be an issue here.

Even assuming query timeout isn't an issue, I think it makes sense to reduce the number of concurrent queries in order to reduce the risk of performance issues.

There's also an issue in testing that I think is related to the number of concurrent queries. When I run tests, I see a lot of MaxListenersExceededWarning. Many of those seem related to analytics. For example:

  task: analytics
    ✔ should not compute analytics if not enabled
    ✔ should not compute analytics if explicitly disabled
    ✔ should not compute analytics if analytics sent recently
(node:80089) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 notice listeners added to [Client]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    ✔ should send analytics if enabled and time to send (111ms)
(node:80089) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 notice listeners added to [Client]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    ✔ should resend analytics if last attempt failed (92ms)
(node:80089) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 notice listeners added to [Client]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    ✔ should log event and full report if analytics sent successfully (87ms)
(node:80089) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 notice listeners added to [Client]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    ✔ should log request errors (85ms)

We've encountered this warning before and connected it to a Slonik issue. That issue was resolved in Slonik 23.6.3, but we're on 23.6.2 (so close!).

In at least one place in testing, we set defaultMaxListeners to more than 10. Instead of setting it in more tests, I think it'd probably be better to address the underlying cause behind the warning. However, I wanted to mention it as one workaround we've tried.

require('events').EventEmitter.defaultMaxListeners = 30;

That said, even if we bumped Slonik and addressed the warning, I think we'd still want to think about reducing the number of concurrent queries.

@matthew-white matthew-white added testing Integration tests, unit tests refactor Improves code without altering behavior labels Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Improves code without altering behavior testing Integration tests, unit tests
Projects
None yet
Development

No branches or pull requests

1 participant