Fix that pgbackrest sometimes stops operating in prod cluster (at least add alerts!) #331

Venryx · 2024-06-25T23:04:49Z

No description provided.

Venryx · 2024-07-25T08:42:35Z

Update

After restoring the database (open terminal in stuck db pod -> scp contents to other server -> launch same version of postgres with the pgdata directory from scp transfer -> pgdump from that temp instance -> clear PVC in prod cluster, and import from pgdump), the pgbackrest backups started working again. (first new backup on June 26th)

On July 25th though, the database pod got its PVC to 100% storage usage again, causing the issue again. I checked the pgbackrest backups at this point, and the last successful one had been on July 20th.

In summary: Pgbackrest config might actually be fine; but there is something causing the backups to fail at some point. (and no alerting in place when that happens! could detect by checking the "Conditions" column of the Kubernetes Jobs in postgres-operator namespace)

Venryx mentioned this issue Jul 25, 2024

Fix whatever is causing the pg_wal folder to grow so large in prod cluster! #338

Open

Venryx changed the title ~~Fix that pgbackrest appears to not be operating in the prod cluster atm~~ Fix that pgbackrest sometimes stops operating in prod cluster (at least add alerts!) Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix that pgbackrest sometimes stops operating in prod cluster (at least add alerts!) #331

Fix that pgbackrest sometimes stops operating in prod cluster (at least add alerts!) #331

Venryx commented Jun 25, 2024

Venryx commented Jul 25, 2024

Fix that pgbackrest sometimes stops operating in prod cluster (at least add alerts!) #331

Fix that pgbackrest sometimes stops operating in prod cluster (at least add alerts!) #331

Comments

Venryx commented Jun 25, 2024

Venryx commented Jul 25, 2024