Query fewer analytics at once

If usage reporting is enabled, Backend will send a sizable number of metrics in its usage report (almost 100). Some of the database queries behind those metrics touch every row in the largest tables (e.g., the `audits`, `submissions`, and `client_audits` tables), so those queries could take longer. Currently, we query all metrics in parallel. Especially as we continue to add metrics, it seems possible that the number of concurrent queries will lead to performance problems. The usage report is run at 3 a.m. UTC, but in some time zones, user activity is expected at that time.

In practice, I don't see much evidence that there have been performance problems. Looking at analytics from the past 31 days, the vast majority of usage reports were received within a minute of 3 a.m. A small number of usage reports were received after 1 minute past 3 a.m., including some after 5 minutes and one after 40 minutes (!). There are multiple variables that could affect the time of submission receipt, so those numbers don't necessarily speak to the performance of the analytics queries, though that's a possibility.

Side note: I'm actually confused about how a usage report could be successful after taking more than 1 minute, much less 5 minutes. I'm pretty sure that the number of analytics queries exceeds the size of the database connection pool. Doesn't Slonik time out fairly quickly if it can't acquire a database connection? Why isn't it timing out these queries? We've seen that transactions use a single database connection even if they make multiple queries, but I don't think that the analytic queries are run in a transaction. Maybe most of the queries are fast enough that it's not a problem.

If query timeout were an issue, I'd be concerned that we're not receiving analytics from large servers where the queries could take a long time. I see large servers in our analytics, so I don't have reason to believe that's happening. But like I said, I'm confused about why query timeout doesn't seem to be an issue here.

Even assuming query timeout isn't an issue, I think it makes sense to reduce the number of concurrent queries in order to reduce the risk of performance issues.

There's also an issue in testing that I think is related to the number of concurrent queries. When I run tests, I see a lot of `MaxListenersExceededWarning`. Many of those seem related to analytics. For example:

```
  task: analytics
    ✔ should not compute analytics if not enabled
    ✔ should not compute analytics if explicitly disabled
    ✔ should not compute analytics if analytics sent recently
(node:80089) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 notice listeners added to [Client]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    ✔ should send analytics if enabled and time to send (111ms)
(node:80089) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 notice listeners added to [Client]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    ✔ should resend analytics if last attempt failed (92ms)
(node:80089) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 notice listeners added to [Client]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    ✔ should log event and full report if analytics sent successfully (87ms)
(node:80089) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 notice listeners added to [Client]. MaxListeners is 10. Use emitter.setMaxListeners() to increase limit
    ✔ should log request errors (85ms)
```

We've encountered this warning [before](https://github.com/getodk/central-backend/pull/451#issuecomment-1034194040) and connected it to a Slonik issue. That issue was resolved in Slonik 23.6.3, but we're on 23.6.2 (so close!).

In at least one place in testing, we set `defaultMaxListeners` to more than 10. Instead of setting it in more tests, I think it'd probably be better to address the underlying cause behind the warning. However, I wanted to mention it as one workaround we've tried.

https://github.com/getodk/central-backend/blob/63ca7881f6e6eb0b5c9051bf64448d802720f100/test/integration/other/analytics-queries.js#L2148-L2148

That said, even if we bumped Slonik and addressed the warning, I think we'd still want to think about reducing the number of concurrent queries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query fewer analytics at once #1228

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query fewer analytics at once #1228

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions