You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tldr;: It can happen that the archive tables contain broken archives (archives without an associated "done" flag). A maintenance task to purge those broken archives should be available.
For bigger instances it is not uncommon to run archiving on multiple servers concurrently.
Due to the way intersecting period archiving is detected (see #22394), it can happen that higher periods are being archived before all lower periods are finished. For example a week is being archived before all individuals days are done.
One way to detect if an instance is affected by that behaviour is to check the archiving logs for the following pattern (example requires --verbose output):
Server 1 - INFO [2024-07-11 18:00:00] Archived website id 2, period = day, date = 2024-07-10, ... Time elapsed: 21600s
Server 3 - DEBUG [2024-07-11 16:00:00] Starting archiving for ...idSite=2&period=month&date=2024-07-01...
Server 2 - DEBUG [2024-07-11 14:00:00] Starting archiving for ...idSite=2&period=week&date=2024-07-08...
Server 1 - DEBUG [2024-07-11 12:00:00] Starting archiving for ...idSite=2&period=day&date=2024-07-10...
This is an example of a setup with 3 archiving servers. By chance both week and month were started before the day archive was completed.
Depending on how the timing of the concurrent archiving processes works out, it can happen that an archive is getting detected as "outdated, not to be used anymore" before the archiving process writing to that archive is actually finished. And then it may happen that more metrics are written to that archive but no "done" flag exists anymore.
To detect if an instance is affected by that problem, the following SQL query/queries can be used (example only checks for day periods):
mysql> SELECT date1, COUNT(DISTINCT idarchive) FROM archive_numeric_2024_07 WHERE period = 1 AND idarchive NOT IN (SELECT DISTINCT idarchive FROM archive_numeric_2024_07 WHERE period = 1 AND name LIKE 'done%') GROUP BY date1;
+------------+---------------------------+
| date1 | COUNT(DISTINCT idarchive) |
+------------+---------------------------+
| 2024-07-01 | 5 |
| 2024-07-02 | 4 |
| 2024-07-03 | 6 |
| 2024-07-04 | 10 |
+------------+---------------------------+
mysql> SELECT DISTINCT idarchive, date1 FROM archive_numeric_2024_07 WHERE period = 1 AND idarchive NOT IN (SELECT DISTINCT idarchive FROM archive_numeric_2024_07 WHERE period = 1 AND name LIKE 'done%');
+-----------+------------+
| idarchive | date1 |
+-----------+------------+
| 1234 | 2024-07-01 |
| 2345 | 2024-07-01 |
| 3456 | 2024-07-01 |
+-----------+------------+
As these archives should have no "done" flag associated with them, the rows will probably never be cleaned up during regular operation. And usually there are also several bigger blob rows (each blob row can grow to multiple MB!) linked to those broken archives.
Depending on the instance size, plugins installed, and visit amount, this can lead to a lot of wasted storage.
The text was updated successfully, but these errors were encountered:
mneudert
added
Enhancement
For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
To Triage
An issue awaiting triage by a Matomo core team member
labels
Jul 29, 2024
tldr;: It can happen that the archive tables contain broken archives (archives without an associated "done" flag). A maintenance task to purge those broken archives should be available.
For bigger instances it is not uncommon to run archiving on multiple servers concurrently.
Due to the way intersecting period archiving is detected (see #22394), it can happen that higher periods are being archived before all lower periods are finished. For example a week is being archived before all individuals days are done.
One way to detect if an instance is affected by that behaviour is to check the archiving logs for the following pattern (example requires
--verbose
output):This is an example of a setup with 3 archiving servers. By chance both week and month were started before the day archive was completed.
Depending on how the timing of the concurrent archiving processes works out, it can happen that an archive is getting detected as "outdated, not to be used anymore" before the archiving process writing to that archive is actually finished. And then it may happen that more metrics are written to that archive but no "done" flag exists anymore.
To detect if an instance is affected by that problem, the following SQL query/queries can be used (example only checks for day periods):
As these archives should have no "done" flag associated with them, the rows will probably never be cleaned up during regular operation. And usually there are also several bigger blob rows (each blob row can grow to multiple MB!) linked to those broken archives.
Depending on the instance size, plugins installed, and visit amount, this can lead to a lot of wasted storage.
The text was updated successfully, but these errors were encountered: