Skip to content

Collector count query needs optimizing #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ross-spencer opened this issue Feb 11, 2025 · 3 comments
Open

Collector count query needs optimizing #5

ross-spencer opened this issue Feb 11, 2025 · 3 comments

Comments

@ross-spencer
Copy link
Member

ross-spencer commented Feb 11, 2025

The following:

SELECT address, COUNT(*) AS total_count, SUM(CASE WHEN datetime(date_time) >= datetime('now', '-1 day') THEN 1 ELSE 0 END) AS count_24hr FROM data_points GROUP BY address ORDER BY total_count DESC;

Was introduced to include daily counts. It is taking a lot longer than the previous call, and for every connection to the database this is adding up.

We can:

  1. either optimize the query,
  2. optimize the python.
@ross-spencer
Copy link
Member Author

A complexity here is there does seem to be some caching going on when testing locally.

We can get some timings through (.timer on), e.g. orig:

Run Time: real 0.018 user 0.014927 sys 0.002855

new:

Run Time: real 0.394 user 0.142779 sys 0.250545

Size of local DB:

180356
Run Time: real 0.002 user 0.000993 sys 0.000984

@ross-spencer
Copy link
Member Author

Current server timings:

Run Time: real 248.644 user 9.503630 sys 46.506179

@ross-spencer
Copy link
Member Author

Looks like an index might help, adding:

CREATE INDEX data_points_dt_addr on data_points(address, date_time);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant