Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide task to find/purge broken archives #22450

Open
mneudert opened this issue Jul 29, 2024 · 0 comments
Open

Provide task to find/purge broken archives #22450

mneudert opened this issue Jul 29, 2024 · 0 comments
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.

Comments

@mneudert
Copy link
Member

mneudert commented Jul 29, 2024

tldr;: It can happen that the archive tables contain broken archives (archives without an associated "done" flag). A maintenance task to purge those broken archives should be available.


For bigger instances it is not uncommon to run archiving on multiple servers concurrently.

Due to the way intersecting period archiving is detected (see #22394), it can happen that higher periods are being archived before all lower periods are finished. For example a week is being archived before all individuals days are done.

One way to detect if an instance is affected by that behaviour is to check the archiving logs for the following pattern (example requires --verbose output):

Server 1 - INFO  [2024-07-11 18:00:00] Archived website id 2, period = day, date = 2024-07-10, ... Time elapsed: 21600s
Server 3 - DEBUG [2024-07-11 16:00:00] Starting archiving for ...idSite=2&period=month&date=2024-07-01...
Server 2 - DEBUG [2024-07-11 14:00:00] Starting archiving for ...idSite=2&period=week&date=2024-07-08...
Server 1 - DEBUG [2024-07-11 12:00:00] Starting archiving for ...idSite=2&period=day&date=2024-07-10...

This is an example of a setup with 3 archiving servers. By chance both week and month were started before the day archive was completed.

Depending on how the timing of the concurrent archiving processes works out, it can happen that an archive is getting detected as "outdated, not to be used anymore" before the archiving process writing to that archive is actually finished. And then it may happen that more metrics are written to that archive but no "done" flag exists anymore.

To detect if an instance is affected by that problem, the following SQL query/queries can be used (example only checks for day periods):

mysql> SELECT date1, COUNT(DISTINCT idarchive) FROM archive_numeric_2024_07 WHERE period = 1 AND idarchive NOT IN (SELECT DISTINCT idarchive FROM archive_numeric_2024_07 WHERE period = 1 AND name LIKE 'done%') GROUP BY date1;

+------------+---------------------------+
| date1      | COUNT(DISTINCT idarchive) |
+------------+---------------------------+
| 2024-07-01 |                         5 |
| 2024-07-02 |                         4 |
| 2024-07-03 |                         6 |
| 2024-07-04 |                        10 |
+------------+---------------------------+

mysql> SELECT DISTINCT idarchive, date1 FROM archive_numeric_2024_07 WHERE period = 1 AND idarchive NOT IN (SELECT DISTINCT idarchive FROM archive_numeric_2024_07 WHERE period = 1 AND name LIKE 'done%');

+-----------+------------+
| idarchive | date1      |
+-----------+------------+
|      1234 | 2024-07-01 |
|      2345 | 2024-07-01 |
|      3456 | 2024-07-01 |
+-----------+------------+

As these archives should have no "done" flag associated with them, the rows will probably never be cleaned up during regular operation. And usually there are also several bigger blob rows (each blob row can grow to multiple MB!) linked to those broken archives.

Depending on the instance size, plugins installed, and visit amount, this can lead to a lot of wasted storage.

@mneudert mneudert added Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. To Triage An issue awaiting triage by a Matomo core team member labels Jul 29, 2024
@des-innocraft des-innocraft removed the To Triage An issue awaiting triage by a Matomo core team member label Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

2 participants