You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CLI backup command is not reliable and does not provide a usable UI, for the following reasons:
It does not implement commonly used output configuration options that would be suitable for backup scripts, i.e. it only has a single output mode: everything. This makes it impossible for the administrator to verify a backup's completeness and integrity as they would have to read the (lengthy) output for each backup created. For example, there could be a "quiet-mode" that only creates output on warnings or errors, which could be used in backup scripts and cronjobs to have the admin notified if something goes wrong.
Backing up frequently creates a lot of data as each snapshot is chunked independently and common data of subsequent snapshots cannot be reused. In instances with many or large buckets, this fills storage quite quickly and forces admins to only keep a low number of snapshots, where effective data is heavily redundant. This even may go to only keeping a single snapshot and removing it prior to the next backup and if this goes wrong and the instance fails, all data is lost.
InfluxDB is not entering a backup mode where it would lock bucket access during a backup. Thus it is still accepting read and write requests during the backup (as well during a restore). This is something many databases have implemented for a reason: to prevent inconsistent data being backed up. To prevent this, an admin would have to ensure that all access to the instance is blocked during backup/restore, which can be quite a task when the instance is usually accessible in a large local network and not all access is piped through a single instance such as a web server.
In our instance (with a data dir of ~50GB) the backup is really slow (see logs below).
The specific issue that we ran into: After a server reboot, our instance was unresponsive with no clue in the logs, what was wrong. When trying to test restore to a blank instance (which we had tested earlier this year) we saw that about 50% of the weekly saved snapshot were corrupt (data chunks in .tar.gz as well as boltDB and SQLite dumps were present, manifest files were somehow missing). The backup outputs, piped from the server backup script to a log file, only showed an error message that the InfluxDB API could not be reached -- we had forgotten to pipe the output in append mode and rotate the log file. The youngest consistent weekly was 8 weeks old. As the instance could not be repaired with reasonable amount of work, we decided to restore and settle with the loss of 8 weeks of data.
The CLI
backup
command is not reliable and does not provide a usable UI, for the following reasons:The specific issue that we ran into: After a server reboot, our instance was unresponsive with no clue in the logs, what was wrong. When trying to test restore to a blank instance (which we had tested earlier this year) we saw that about 50% of the weekly saved snapshot were corrupt (data chunks in
.tar.gz
as well as boltDB and SQLite dumps were present, manifest files were somehow missing). The backup outputs, piped from the server backup script to a log file, only showed an error message that the InfluxDB API could not be reached -- we had forgotten to pipe the output in append mode and rotate the log file. The youngest consistent weekly was 8 weeks old. As the instance could not be repaired with reasonable amount of work, we decided to restore and settle with the loss of 8 weeks of data.Environment info:
Linux 5.15.0-126-generic x86_64
Logs:
This is the routine output a recent backup:
The text was updated successfully, but these errors were encountered: