-
Notifications
You must be signed in to change notification settings - Fork 450
Description
What happened
Today we had a production outage of a 11 minutes, suspectedly because:
- Stolon does not correctly recover from the master's disk being full. It stops serving
pg_basebackuprequests from follower nodes.
The downtime lasted only 11 minutes because I manually restarted the master stolon-keeper that stopped serving pg_basebackup requests.
I observed the following in our 3-node Stolon cluster composed of nodes node-4, node-5, node-6:
node-5temporarily ran out of disk space while being stolon master.- The other 2 nodes tried to sync up with the
node-5master, by runningpg_basebackup. - Disk space was freed on
node-5. - Stolon did not recover from the master having a full-disk event:
- The follower DB nodes were blocked in
pg_basebackup, and restarting them did not help.- Last logs of followers:
pg_basebackup: initiating base backup, waiting for checkpoint to complete
- Last logs of followers:
- Apparently the Stolon master stopped serving
pg_basebackupresponses after the full-disk event, even though disk space was freed up. - This looks like a Stolon bug to me.
- Maybe similar to a Stolon bug we hit earlier, with the common Go bug of not checking the return value of
Write...()functions:
- Maybe similar to a Stolon bug we hit earlier, with the common Go bug of not checking the return value of
- The follower DB nodes were blocked in
What you expected to happen
Stolon should serve pg_basebackup responses when the disk is no longer full.
How to reproduce it (as minimally and precisely as possible)
I have not yet tried to reproduce it exactly, as I am not sure how to trigger the other 2 nodes trying to do pg_basebackup at the correct time.
Logs
Follower logs
Follower logs hanging in pg_basebackup: initiating base backup, waiting for checkpoint to complete:
Click to expand logs
Aug 15 10:25:08 node-4 keeper[3426683]: 2023-08-15T10:25:08.700Z INFO cmd/keeper.go:1576 already standby
Aug 15 10:25:08 node-4 keeper[3426683]: 2023-08-15T10:25:08.717Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:25:08 node-4 keeper[3426683]: 2023-08-15T10:25:08.717Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:25:10 node-4 keeper[3426683]: 2023-08-15T10:25:10.734Z INFO cmd/keeper.go:1141 current db UID different than cluster data db UID {"db": "c6cb2f35", "cdDB": "6f2488cf"}
Aug 15 10:25:10 node-4 keeper[3426683]: 2023-08-15T10:25:10.734Z INFO cmd/keeper.go:1296 resyncing the database cluster
Aug 15 10:25:10 node-4 keeper[3426683]: 2023-08-15T10:25:10.736Z INFO postgresql/postgresql.go:384 stopping database
Aug 15 10:25:10 node-4 keeper[1624016]: waiting for server to shut down.... done
Aug 15 10:25:10 node-4 keeper[1624016]: server stopped
Aug 15 10:25:10 node-4 keeper[3426683]: 2023-08-15T10:25:10.848Z INFO cmd/keeper.go:899 syncing using pg_rewind {"masterDB": "140f41c1", "keeper": "node_5"}
Aug 15 10:25:10 node-4 keeper[3426683]: 2023-08-15T10:25:10.848Z INFO postgresql/postgresql.go:921 running pg_rewind
Aug 15 10:25:10 node-4 keeper[1624020]: pg_rewind: fetched file "global/pg_control", length 8192
Aug 15 10:25:10 node-4 keeper[1624020]: pg_rewind: source and target cluster are on the same timeline
Aug 15 10:25:10 node-4 keeper[1624020]: pg_rewind: no rewind required
Aug 15 10:25:10 node-4 keeper[3426683]: 2023-08-15T10:25:10.865Z INFO postgresql/postgresql.go:844 writing standby signal file
Aug 15 10:25:10 node-4 keeper[3426683]: 2023-08-15T10:25:10.865Z INFO postgresql/postgresql.go:319 starting database
Aug 15 10:25:10 node-4 keeper[1624022]: 2023-08-15 10:25:10.874 UTC [1624022] LOG: redirecting log output to logging collector process
Aug 15 10:25:10 node-4 keeper[1624022]: 2023-08-15 10:25:10.874 UTC [1624022] HINT: Future log output will appear in directory "/var/log/postgres".
Aug 15 10:25:11 node-4 keeper[3426683]: 2023-08-15T10:25:11.103Z INFO cmd/keeper.go:968 followed instance timeline forked before our current state {"followedTimeline": 33, "followedXlogpos": 454746433152, "timeline": 32, "xlogpos": 454827696368}
Aug 15 10:25:11 node-4 keeper[3426683]: 2023-08-15T10:25:11.103Z INFO postgresql/postgresql.go:384 stopping database
Aug 15 10:25:11 node-4 keeper[1624036]: waiting for server to shut down.... done
Aug 15 10:25:11 node-4 keeper[1624036]: server stopped
Aug 15 10:25:11 node-4 keeper[3426683]: 2023-08-15T10:25:11.389Z INFO cmd/keeper.go:925 syncing from followed db {"followedDB": "140f41c1", "keeper": "node_5"}
Aug 15 10:25:11 node-4 keeper[3426683]: 2023-08-15T10:25:11.389Z INFO postgresql/postgresql.go:964 running pg_basebackup
Aug 15 10:25:11 node-4 keeper[3426683]: pg_basebackup: initiating base backup, waiting for checkpoint to complete
Master logs
Logs of the master while it was not serving the other nodes' pg_basebackup requests:
Click to expand logs
Aug 15 10:26:49 benaco-node-5 keeper[2851117]: 2023-08-15T10:26:49.899Z INFO cmd/keeper.go:1543 already master
Aug 15 10:26:49 benaco-node-5 keeper[2851117]: 2023-08-15T10:26:49.916Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:26:49 benaco-node-5 keeper[2851117]: 2023-08-15T10:26:49.916Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:26:51 benaco-node-5 keeper[2851117]: 2023-08-15T10:26:51.946Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:26:51 benaco-node-5 keeper[2851117]: 2023-08-15T10:26:51.947Z INFO cmd/keeper.go:1543 already master
Aug 15 10:26:51 benaco-node-5 keeper[2851117]: 2023-08-15T10:26:51.963Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:26:51 benaco-node-5 keeper[2851117]: 2023-08-15T10:26:51.964Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Logs of the master keeper during the entirety of the event, from the initial No space left on device to recovery by my manual restart of the keeper:
Click to expand logs
Aug 15 09:54:07 node-5 keeper[2851117]: 2023-08-15T09:54:07.306Z INFO cmd/keeper.go:1543 already master
Aug 15 09:54:07 node-5 keeper[2851117]: 2023-08-15T09:54:07.320Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 09:54:07 node-5 keeper[2851117]: 2023-08-15T09:54:07.320Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 09:54:07 node-5 keeper[2851286]: could not write to log file: No space left on device
Aug 15 09:54:07 node-5 keeper[2851286]: could not write to log file: No space left on device
...
Aug 15 09:54:09 node-5 keeper[2851286]: could not write to log file: No space left on device
Aug 15 09:54:09 node-5 keeper[2851117]: 2023-08-15T09:54:09.351Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:54:09 node-5 keeper[2851117]: 2023-08-15T09:54:09.352Z INFO cmd/keeper.go:1543 already master
Aug 15 09:54:09 node-5 keeper[2851117]: 2023-08-15T09:54:09.366Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 09:54:09 node-5 keeper[2851117]: 2023-08-15T09:54:09.366Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 09:54:09 node-5 keeper[2851117]: 2023-08-15T09:54:09.377Z ERROR cmd/keeper.go:1742 failed to save db local state {"error": "write /var/lib/postgres/node_5/dbstate848970997: no space left on device"}
Aug 15 09:54:09 node-5 keeper[2851286]: could not write to log file: No space left on device
Aug 15 09:54:10 node-5 keeper[2851286]: could not write to log file: No space left on device
Aug 15 09:54:10 node-5 keeper[2851286]: could not write to log file: No space left on device
Aug 15 09:54:10 node-5 keeper[2851286]: could not write to log file: No space left on device
Aug 15 09:54:11 node-5 keeper[2851286]: could not write to log file: No space left on device
Aug 15 09:54:11 node-5 keeper[2851117]: 2023-08-15T09:54:11.403Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:54:11 node-5 keeper[2851117]: 2023-08-15T09:54:11.403Z INFO cmd/keeper.go:1543 already master
Aug 15 09:54:11 node-5 keeper[2851117]: 2023-08-15T09:54:11.419Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 09:54:11 node-5 keeper[2851117]: 2023-08-15T09:54:11.419Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 09:54:11 node-5 keeper[2851117]: 2023-08-15T09:54:11.430Z ERROR cmd/keeper.go:1742 failed to save db local state {"error": "write /var/lib/postgres/node_5/dbstate696989648: no space left on device"}
Aug 15 09:54:12 node-5 keeper[2851286]: could not write to log file: No space left on device
...
Aug 15 09:58:51 node-5 keeper[2851117]: 2023-08-15T09:58:51.980Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:58:51 node-5 keeper[2851117]: 2023-08-15T09:58:51.981Z INFO cmd/keeper.go:1543 already master
Aug 15 09:58:51 node-5 keeper[2851117]: 2023-08-15T09:58:51.994Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 09:58:51 node-5 keeper[2851117]: 2023-08-15T09:58:51.995Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 09:58:52 node-5 keeper[2851117]: 2023-08-15T09:58:52.004Z ERROR cmd/keeper.go:1742 failed to save db local state {"error": "write /var/lib/postgres/node_5/dbstate453193463: no space left on device"}
Aug 15 09:58:53 node-5 keeper[2851286]: could not write to log file: No space left on device
Aug 15 09:58:54 node-5 keeper[2851117]: 2023-08-15T09:58:54.024Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:58:54 node-5 keeper[2851117]: 2023-08-15T09:58:54.024Z INFO cmd/keeper.go:1543 already master
Aug 15 09:58:54 node-5 keeper[2851117]: 2023-08-15T09:58:54.041Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 09:58:54 node-5 keeper[2851117]: 2023-08-15T09:58:54.042Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 09:58:54 node-5 keeper[2851117]: 2023-08-15T09:58:54.054Z ERROR cmd/keeper.go:1742 failed to save db local state {"error": "write /var/lib/postgres/node_5/dbstate195438570: no space left on device"}
Aug 15 09:58:56 node-5 keeper[2851117]: 2023-08-15T09:58:56.071Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:58:56 node-5 keeper[2851117]: 2023-08-15T09:58:56.073Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:58:56 node-5 keeper[2851117]: 2023-08-15T09:58:56.073Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 09:58:56 node-5 keeper[2851117]: 2023-08-15T09:58:56.076Z ERROR cmd/keeper.go:1527 failed to start postgres {"error": "error writing postgresql.conf file: write /var/lib/postgres/node_5/postgres/postgresql.conf900267073: no space left on device"}
Aug 15 09:58:57 node-5 keeper[2851117]: 2023-08-15T09:58:57.072Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:58:58 node-5 keeper[2851117]: 2023-08-15T09:58:58.073Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:58:58 node-5 keeper[2851117]: 2023-08-15T09:58:58.092Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:58:58 node-5 keeper[2851117]: 2023-08-15T09:58:58.093Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 09:58:58 node-5 keeper[2851117]: 2023-08-15T09:58:58.096Z ERROR cmd/keeper.go:1527 failed to start postgres {"error": "error writing postgresql.conf file: write /var/lib/postgres/node_5/postgres/postgresql.conf974015148: no space left on device"}
Aug 15 09:58:59 node-5 keeper[2851117]: 2023-08-15T09:58:59.074Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:00 node-5 keeper[2851117]: 2023-08-15T09:59:00.075Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:00 node-5 keeper[2851117]: 2023-08-15T09:59:00.118Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:59:00 node-5 keeper[2851117]: 2023-08-15T09:59:00.118Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 09:59:00 node-5 keeper[2851117]: 2023-08-15T09:59:00.124Z ERROR cmd/keeper.go:1527 failed to start postgres {"error": "error writing postgresql.conf file: write /var/lib/postgres/node_5/postgres/postgresql.conf941834779: no space left on device"}
Aug 15 09:59:01 node-5 keeper[2851117]: 2023-08-15T09:59:01.076Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:02 node-5 keeper[2851117]: 2023-08-15T09:59:02.077Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:02 node-5 keeper[2851117]: 2023-08-15T09:59:02.148Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:59:02 node-5 keeper[2851117]: 2023-08-15T09:59:02.149Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 09:59:02 node-5 keeper[2851117]: 2023-08-15T09:59:02.152Z ERROR cmd/keeper.go:1527 failed to start postgres {"error": "error writing postgresql.conf file: write /var/lib/postgres/node_5/postgres/postgresql.conf076239550: no space left on device"}
Aug 15 09:59:03 node-5 keeper[2851117]: 2023-08-15T09:59:03.078Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:04 node-5 keeper[2851117]: 2023-08-15T09:59:04.079Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:04 node-5 keeper[2851117]: 2023-08-15T09:59:04.171Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:59:04 node-5 keeper[2851117]: 2023-08-15T09:59:04.171Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 09:59:04 node-5 keeper[2851117]: 2023-08-15T09:59:04.175Z ERROR cmd/keeper.go:1527 failed to start postgres {"error": "error writing postgresql.conf file: write /var/lib/postgres/node_5/postgres/postgresql.conf252173061: no space left on device"}
Aug 15 09:59:05 node-5 keeper[2851117]: 2023-08-15T09:59:05.080Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:06 node-5 keeper[2851117]: 2023-08-15T09:59:06.080Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:06 node-5 keeper[2851117]: 2023-08-15T09:59:06.197Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 09:59:06 node-5 keeper[2851117]: 2023-08-15T09:59:06.198Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 09:59:06 node-5 keeper[2851117]: 2023-08-15T09:59:06.203Z ERROR cmd/keeper.go:1527 failed to start postgres {"error": "error writing postgresql.conf file: write /var/lib/postgres/node_5/postgres/postgresql.conf224427168: no space left on device"}
Aug 15 09:59:07 node-5 keeper[2851117]: 2023-08-15T09:59:07.081Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:08 node-5 keeper[2851117]: 2023-08-15T09:59:08.082Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:08 node-5 keeper[2851117]: 2023-08-15T09:59:08.207Z INFO cmd/keeper.go:1141 current db UID different than cluster data db UID {"db": "3397fb23", "cdDB": "140f41c1"}
Aug 15 09:59:08 node-5 keeper[2851117]: 2023-08-15T09:59:08.207Z INFO cmd/keeper.go:1296 resyncing the database cluster
Aug 15 09:59:08 node-5 keeper[2851117]: 2023-08-15T09:59:08.208Z ERROR cmd/keeper.go:1305 failed to save db local state {"error": "write /var/lib/postgres/node_5/dbstate649249151: no space left on device"}
Aug 15 09:59:09 node-5 keeper[2851117]: 2023-08-15T09:59:09.083Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:10 node-5 keeper[2851117]: 2023-08-15T09:59:10.084Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:10 node-5 keeper[2851117]: 2023-08-15T09:59:10.212Z INFO cmd/keeper.go:1141 current db UID different than cluster data db UID {"db": "3397fb23", "cdDB": "140f41c1"}
Aug 15 09:59:10 node-5 keeper[2851117]: 2023-08-15T09:59:10.212Z INFO cmd/keeper.go:1296 resyncing the database cluster
Aug 15 09:59:10 node-5 keeper[2851117]: 2023-08-15T09:59:10.212Z ERROR cmd/keeper.go:1305 failed to save db local state {"error": "write /var/lib/postgres/node_5/dbstate210388178: no space left on device"}
Aug 15 09:59:11 node-5 keeper[2851117]: 2023-08-15T09:59:11.085Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:12 node-5 keeper[2851117]: 2023-08-15T09:59:12.086Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 09:59:12 node-5 keeper[2851117]: 2023-08-15T09:59:12.216Z INFO cmd/keeper.go:1141 current db UID different than cluster data db UID {"db": "3397fb23", "cdDB": "140f41c1"}
Aug 15 09:59:12 node-5 keeper[2851117]: 2023-08-15T09:59:12.216Z INFO cmd/keeper.go:1296 resyncing the database cluster
...
Aug 15 10:06:07 node-5 keeper[2851117]: 2023-08-15T10:06:07.270Z INFO cmd/keeper.go:1296 resyncing the database cluster
Aug 15 10:06:07 node-5 keeper[2851117]: 2023-08-15T10:06:07.270Z ERROR cmd/keeper.go:1305 failed to save db local state {"error": "write /var/lib/postgres/node_5/dbstate789537122: no space left on device"}
Aug 15 10:06:07 node-5 keeper[2851117]: 2023-08-15T10:06:07.440Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:06:08 node-5 keeper[2851117]: 2023-08-15T10:06:08.441Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:06:09 node-5 keeper[2851117]: 2023-08-15T10:06:09.274Z INFO cmd/keeper.go:1141 current db UID different than cluster data db UID {"db": "3397fb23", "cdDB": "140f41c1"}
Aug 15 10:06:09 node-5 keeper[2851117]: 2023-08-15T10:06:09.274Z INFO cmd/keeper.go:1296 resyncing the database cluster
Aug 15 10:06:09 node-5 keeper[2851117]: 2023-08-15T10:06:09.288Z INFO cmd/keeper.go:899 syncing using pg_rewind {"masterDB": "32e06e90", "keeper": "node_6"}
Aug 15 10:06:09 node-5 keeper[2851117]: 2023-08-15T10:06:09.288Z INFO postgresql/postgresql.go:921 running pg_rewind
Aug 15 10:06:09 node-5 keeper[1081936]: pg_rewind: executing "/nix/store/swc4d2nvp800dibijq09ksbw26nwg6yp-postgresql-14.1/bin/postgres" for target server to complete crash recovery
Aug 15 10:06:09 node-5 keeper[1081939]: 2023-08-15 10:06:09.314 GMT [1081939] LOG: skipping missing configuration file "/var/lib/postgres/node_5/postgres/postgresql.auto.conf"
Aug 15 10:06:09 node-5 keeper[1081939]: 2023-08-15 10:06:09.318 UTC [1081939] LOG: database system shutdown was interrupted; last known up at 2023-08-15 09:58:55 UTC
Aug 15 10:06:09 node-5 keeper[1081939]: 2023-08-15 10:06:09.319 UTC [1081939] LOG: database system was not properly shut down; automatic recovery in progress
Aug 15 10:06:09 node-5 keeper[1081939]: 2023-08-15 10:06:09.319 UTC [1081939] LOG: redo starts at 69/C835E230
Aug 15 10:06:09 node-5 keeper[1081939]: 2023-08-15 10:06:09.359 UTC [1081939] LOG: invalid record length at 69/CAC1E670: wanted 24, got 0
Aug 15 10:06:09 node-5 keeper[1081939]: 2023-08-15 10:06:09.359 UTC [1081939] LOG: redo done at 69/CAC1E648 system usage: CPU: user: 0.02 s, system: 0.01 s, elapsed: 0.04 s
Aug 15 10:06:09 node-5 keeper[1081939]: PostgreSQL stand-alone backend 14.1
Aug 15 10:06:09 node-5 keeper[1081939]: backend>
Aug 15 10:06:09 node-5 keeper[1081936]: pg_rewind: fetched file "global/pg_control", length 8192
Aug 15 10:06:09 node-5 keeper[1081936]: pg_rewind: fetched file "pg_wal/00000020.history", length 1379
Aug 15 10:06:09 node-5 keeper[1081936]: pg_rewind: Source timeline history:
Aug 15 10:06:09 node-5 keeper[1081936]: pg_rewind: Target timeline history:
Aug 15 10:06:09 node-5 keeper[1081936]: pg_rewind: 1: 0/0 - 5/B8C7E6D0
Aug 15 10:06:09 node-5 keeper[1081936]: pg_rewind: 2: 5/B8C7E6D0 - 8/F66E8E98
...
Aug 15 10:06:09 node-5 keeper[1081936]: pg_rewind: received chunk for file "pg_wal/0000002000000069000000CC", offset 4194304, size 1048576
Aug 15 10:06:09 node-5 keeper[1081936]: pg_rewind: fatal: could not write file "/var/lib/postgres/node_5/postgres/pg_wal/0000002000000069000000CC": No space left on device
Aug 15 10:06:09 node-5 keeper[2851117]: 2023-08-15T10:06:09.541Z ERROR cmd/keeper.go:902 error syncing with pg_rewind {"error": "error: exit status 1"}
Aug 15 10:06:09 node-5 keeper[2851117]: 2023-08-15T10:06:09.779Z INFO cmd/keeper.go:925 syncing from followed db {"followedDB": "32e06e90", "keeper": "node_6"}
Aug 15 10:06:09 node-5 keeper[2851117]: 2023-08-15T10:06:09.780Z INFO postgresql/postgresql.go:964 running pg_basebackup
Aug 15 10:06:09 node-5 keeper[2851117]: pg_basebackup: initiating base backup, waiting for checkpoint to complete
Aug 15 10:07:41 node-5 keeper[2851117]: WARNING: skipping special file "./postgresql.auto.conf"
Aug 15 10:07:41 node-5 keeper[2851117]: pg_basebackup: checkpoint completed
Aug 15 10:07:41 node-5 keeper[2851117]: pg_basebackup: write-ahead log start point: 69/D4000098 on timeline 32
Aug 15 10:07:41 node-5 keeper[2851117]: pg_basebackup: starting background WAL receiver
Aug 15 10:07:41 node-5 keeper[2851117]: 0/3622590 kB (0%), 0/1 tablespace (...aco_node_5/postgres/backup_label)
Aug 15 10:07:41 node-5 keeper[2851117]: WARNING: skipping special file "./postgresql.auto.conf"
Aug 15 10:07:42 node-5 keeper[2851117]: 131258/3622590 kB (3%), 0/1 tablespace (...de_5/postgres/base/14756/16490.1)
Aug 15 10:07:43 node-5 keeper[2851117]: 419610/3622590 kB (11%), 0/1 tablespace (...de_5/postgres/base/14756/16490.1)
Aug 15 10:07:44 node-5 keeper[2851117]: 713658/3622590 kB (19%), 0/1 tablespace (...de_5/postgres/base/14756/16490.1)
Aug 15 10:07:45 node-5 keeper[2851117]: 979354/3622590 kB (27%), 0/1 tablespace (...de_5/postgres/base/14756/16490.1)
Aug 15 10:07:46 node-5 keeper[2851117]: 1275102/3622590 kB (35%), 0/1 tablespace (...node_5/postgres/base/14756/16490)
Aug 15 10:07:47 node-5 keeper[2851117]: 1567422/3622590 kB (43%), 0/1 tablespace (...node_5/postgres/base/14756/16490)
Aug 15 10:07:48 node-5 keeper[2851117]: 1914206/3622590 kB (52%), 0/1 tablespace (...node_5/postgres/base/14756/16490)
Aug 15 10:07:49 node-5 keeper[2851117]: 2207486/3622590 kB (60%), 0/1 tablespace (...node_5/postgres/base/14756/16490)
Aug 15 10:07:50 node-5 keeper[2851117]: 2527241/3622590 kB (69%), 0/1 tablespace (...node_5/postgres/base/14756/26274)
Aug 15 10:07:51 node-5 keeper[2851117]: 2817748/3622590 kB (77%), 0/1 tablespace (...node_5/postgres/base/14756/26160)
Aug 15 10:07:52 node-5 keeper[2851117]: 3082790/3622590 kB (85%), 0/1 tablespace (...node_5/postgres/base/14756/16493)
Aug 15 10:07:53 node-5 keeper[2851117]: 3311206/3622590 kB (91%), 0/1 tablespace (...node_5/postgres/base/14756/16493)
Aug 15 10:07:54 node-5 keeper[2851117]: 3604134/3622590 kB (99%), 0/1 tablespace (...node_5/postgres/base/14756/16493)
Aug 15 10:07:54 node-5 keeper[2851117]: 3625760/3625760 kB (100%), 0/1 tablespace (...ode_5/postgres/global/pg_control)
Aug 15 10:07:54 node-5 keeper[2851117]: 3625760/3625760 kB (100%), 1/1 tablespace
Aug 15 10:07:54 node-5 keeper[2851117]: pg_basebackup: write-ahead log end point: 69/D5DFAC98
Aug 15 10:07:54 node-5 keeper[2851117]: pg_basebackup: waiting for background process to finish streaming ...
Aug 15 10:07:54 node-5 keeper[2851117]: pg_basebackup: syncing data to disk ...
Aug 15 10:07:54 node-5 keeper[2851117]: 2023-08-15T10:07:54.509Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:07:55 node-5 keeper[2851117]: 2023-08-15T10:07:55.509Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:07:56 node-5 keeper[2851117]: 2023-08-15T10:07:56.510Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:07:57 node-5 keeper[2851117]: 2023-08-15T10:07:57.511Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:07:58 node-5 keeper[2851117]: 2023-08-15T10:07:58.511Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:07:59 node-5 keeper[2851117]: 2023-08-15T10:07:59.512Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:08:00 node-5 keeper[2851117]: 2023-08-15T10:08:00.513Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:08:01 node-5 keeper[2851117]: 2023-08-15T10:08:01.514Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:08:01 node-5 keeper[2851117]: pg_basebackup: renaming backup_manifest.tmp to backup_manifest
Aug 15 10:08:01 node-5 keeper[2851117]: pg_basebackup: base backup completed
Aug 15 10:08:01 node-5 keeper[2851117]: 2023-08-15T10:08:01.546Z INFO cmd/keeper.go:931 sync succeeded
Aug 15 10:08:01 node-5 keeper[2851117]: 2023-08-15T10:08:01.552Z INFO postgresql/postgresql.go:844 writing standby signal file
Aug 15 10:08:01 node-5 keeper[2851117]: 2023-08-15T10:08:01.552Z INFO postgresql/postgresql.go:319 starting database
Aug 15 10:08:01 node-5 keeper[1082418]: 2023-08-15 10:08:01.563 UTC [1082418] LOG: redirecting log output to logging collector process
Aug 15 10:08:01 node-5 keeper[1082418]: 2023-08-15 10:08:01.563 UTC [1082418] HINT: Future log output will appear in directory "/var/log/postgres".
Aug 15 10:08:01 node-5 keeper[2851117]: 2023-08-15T10:08:01.801Z INFO cmd/keeper.go:1557 our db requested role is standby {"followedDB": "32e06e90"}
Aug 15 10:08:01 node-5 keeper[2851117]: 2023-08-15T10:08:01.801Z INFO cmd/keeper.go:1576 already standby
Aug 15 10:08:01 node-5 keeper[2851117]: 2023-08-15T10:08:01.815Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:08:01 node-5 keeper[2851117]: 2023-08-15T10:08:01.815Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:08:03 node-5 keeper[2851117]: 2023-08-15T10:08:03.843Z INFO cmd/keeper.go:1557 our db requested role is standby {"followedDB": "32e06e90"}
Aug 15 10:08:03 node-5 keeper[2851117]: 2023-08-15T10:08:03.844Z INFO cmd/keeper.go:1576 already standby
...
Aug 15 10:19:53 node-5 keeper[2851117]: 2023-08-15T10:19:53.774Z INFO cmd/keeper.go:1576 already standby
Aug 15 10:19:53 node-5 keeper[2851117]: 2023-08-15T10:19:53.791Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:19:53 node-5 keeper[2851117]: 2023-08-15T10:19:53.791Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:19:55 node-5 keeper[2851117]: 2023-08-15T10:19:55.821Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:19:55 node-5 keeper[2851117]: 2023-08-15T10:19:55.822Z INFO cmd/keeper.go:1537 promoting to master
Aug 15 10:19:55 node-5 keeper[2851117]: 2023-08-15T10:19:55.822Z INFO postgresql/postgresql.go:532 promoting database
Aug 15 10:19:56 node-5 keeper[1090879]: waiting for server to promote....
Aug 15 10:19:56 node-5 keeper[2851117]: 2023-08-15T10:19:56.234Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:19:57 node-5 keeper[1090879]: .
Aug 15 10:19:57 node-5 keeper[2851117]: 2023-08-15T10:19:57.235Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:19:58 node-5 keeper[1090879]: .
...
Aug 15 10:20:55 node-5 keeper[2851117]: 2023-08-15T10:20:55.282Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:20:55 node-5 keeper[1090879]: stopped waiting
Aug 15 10:20:55 node-5 keeper[1090879]: pg_ctl: server did not promote in time
Aug 15 10:20:55 node-5 keeper[2851117]: 2023-08-15T10:20:55.898Z ERROR cmd/keeper.go:1539 failed to promote instance {"error": "error: exit status 1"}
Aug 15 10:20:56 node-5 keeper[2851117]: 2023-08-15T10:20:56.283Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:20:57 node-5 keeper[2851117]: 2023-08-15T10:20:57.285Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:20:57 node-5 keeper[2851117]: 2023-08-15T10:20:57.913Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:20:57 node-5 keeper[2851117]: 2023-08-15T10:20:57.913Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 10:20:57 node-5 keeper[2851117]: 2023-08-15T10:20:57.919Z INFO postgresql/postgresql.go:319 starting database
Aug 15 10:20:57 node-5 keeper[1091196]: 2023-08-15 10:20:57.931 UTC [1091196] LOG: redirecting log output to logging collector process
Aug 15 10:20:57 node-5 keeper[1091196]: 2023-08-15 10:20:57.931 UTC [1091196] HINT: Future log output will appear in directory "/var/log/postgres".
Aug 15 10:20:58 node-5 keeper[2851117]: 2023-08-15T10:20:58.129Z INFO cmd/keeper.go:1537 promoting to master
Aug 15 10:20:58 node-5 keeper[2851117]: 2023-08-15T10:20:58.129Z INFO postgresql/postgresql.go:532 promoting database
Aug 15 10:20:58 node-5 keeper[1091203]: waiting for server to promote....
Aug 15 10:20:58 node-5 keeper[2851117]: 2023-08-15T10:20:58.286Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:20:59 node-5 keeper[1091203]: .
Aug 15 10:20:59 node-5 keeper[2851117]: 2023-08-15T10:20:59.287Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:21:00 node-5 keeper[1091203]: .
...
Aug 15 10:21:57 node-5 keeper[2851117]: 2023-08-15T10:21:57.336Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:21:58 node-5 keeper[1091203]: stopped waiting
Aug 15 10:21:58 node-5 keeper[1091203]: pg_ctl: server did not promote in time
Aug 15 10:21:58 node-5 keeper[2851117]: 2023-08-15T10:21:58.205Z ERROR cmd/keeper.go:1539 failed to promote instance {"error": "error: exit status 1"}
Aug 15 10:21:58 node-5 keeper[2851117]: 2023-08-15T10:21:58.337Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:21:59 node-5 keeper[2851117]: 2023-08-15T10:21:59.338Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:22:00 node-5 keeper[2851117]: 2023-08-15T10:22:00.228Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:22:00 node-5 keeper[2851117]: 2023-08-15T10:22:00.229Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 10:22:00 node-5 keeper[2851117]: 2023-08-15T10:22:00.234Z INFO postgresql/postgresql.go:319 starting database
Aug 15 10:22:00 node-5 keeper[1091541]: 2023-08-15 10:22:00.242 UTC [1091541] LOG: redirecting log output to logging collector process
Aug 15 10:22:00 node-5 keeper[1091541]: 2023-08-15 10:22:00.242 UTC [1091541] HINT: Future log output will appear in directory "/var/log/postgres".
Aug 15 10:22:00 node-5 keeper[2851117]: 2023-08-15T10:22:00.339Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "pq: the database system is not yet accepting connections"}
Aug 15 10:22:00 node-5 keeper[2851117]: 2023-08-15T10:22:00.444Z INFO cmd/keeper.go:1537 promoting to master
Aug 15 10:22:00 node-5 keeper[2851117]: 2023-08-15T10:22:00.444Z INFO postgresql/postgresql.go:532 promoting database
Aug 15 10:22:01 node-5 keeper[1091549]: waiting for server to promote....
Aug 15 10:22:01 node-5 keeper[2851117]: 2023-08-15T10:22:01.340Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:22:02 node-5 keeper[1091549]: .
...
Aug 15 10:25:04 node-5 keeper[2851117]: 2023-08-15T10:25:04.483Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:25:05 node-5 keeper[1092235]: stopped waiting
Aug 15 10:25:05 node-5 keeper[1092235]: pg_ctl: server did not promote in time
Aug 15 10:25:05 node-5 keeper[2851117]: 2023-08-15T10:25:05.145Z ERROR cmd/keeper.go:1539 failed to promote instance {"error": "error: exit status 1"}
Aug 15 10:25:05 node-5 keeper[2851117]: 2023-08-15T10:25:05.483Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:25:06 node-5 keeper[2851117]: 2023-08-15T10:25:06.484Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:25:07 node-5 keeper[2851117]: 2023-08-15T10:25:07.165Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:25:07 node-5 keeper[2851117]: 2023-08-15T10:25:07.166Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 10:25:07 node-5 keeper[2851117]: 2023-08-15T10:25:07.172Z INFO postgresql/postgresql.go:319 starting database
Aug 15 10:25:07 node-5 keeper[1093152]: 2023-08-15 10:25:07.179 UTC [1093152] LOG: redirecting log output to logging collector process
Aug 15 10:25:07 node-5 keeper[1093152]: 2023-08-15 10:25:07.179 UTC [1093152] HINT: Future log output will appear in directory "/var/log/postgres".
Aug 15 10:25:07 node-5 keeper[2851117]: 2023-08-15T10:25:07.380Z INFO cmd/keeper.go:1537 promoting to master
Aug 15 10:25:07 node-5 keeper[2851117]: 2023-08-15T10:25:07.380Z INFO postgresql/postgresql.go:532 promoting database
Aug 15 10:25:07 node-5 keeper[1093160]: waiting for server to promote.... done
Aug 15 10:25:07 node-5 keeper[1093160]: server promoted
Aug 15 10:25:07 node-5 keeper[2851117]: 2023-08-15T10:25:07.600Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:25:07 node-5 keeper[2851117]: 2023-08-15T10:25:07.607Z INFO cmd/keeper.go:1689 not allowing connection as normal users since synchronous replication is enabled, instance was down and not all sync standbys are synced
Aug 15 10:25:07 node-5 keeper[2851117]: 2023-08-15T10:25:07.607Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:25:09 node-5 keeper[2851117]: 2023-08-15T10:25:09.634Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:25:09 node-5 keeper[2851117]: 2023-08-15T10:25:09.634Z INFO cmd/keeper.go:1543 already master
Aug 15 10:25:09 node-5 keeper[2851117]: 2023-08-15T10:25:09.651Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:25:09 node-5 keeper[2851117]: 2023-08-15T10:25:09.657Z INFO cmd/keeper.go:1689 not allowing connection as normal users since synchronous replication is enabled, instance was down and not all sync standbys are synced
Aug 15 10:25:09 node-5 keeper[2851117]: 2023-08-15T10:25:09.657Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:25:11 node-5 keeper[2851117]: 2023-08-15T10:25:11.691Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:25:11 node-5 keeper[2851117]: 2023-08-15T10:25:11.692Z INFO cmd/keeper.go:1543 already master
Aug 15 10:25:11 node-5 keeper[2851117]: 2023-08-15T10:25:11.697Z INFO cmd/keeper.go:1008 creating replication slot {"slot": "stolon_6f2488cf"}
Aug 15 10:25:11 node-5 keeper[2851117]: 2023-08-15T10:25:11.702Z INFO cmd/keeper.go:1008 creating replication slot {"slot": "stolon_e1e06025"}
Aug 15 10:25:11 node-5 keeper[2851117]: 2023-08-15T10:25:11.719Z INFO cmd/keeper.go:1659 needed synchronous_standby_names changed {"prevSyncStandbyNames": "2 (stolon_32e06e90,stolon_c6cb2f35)", "syncStandbyNames": "stolonfakestandby"}
Aug 15 10:25:11 node-5 keeper[2851117]: 2023-08-15T10:25:11.720Z INFO cmd/keeper.go:1671 postgres parameters changed, reloading postgres instance
Aug 15 10:25:11 node-5 keeper[2851117]: 2023-08-15T10:25:11.726Z INFO cmd/keeper.go:1698 postgres hba entries changed, reloading postgres instance
Aug 15 10:25:11 node-5 keeper[2851117]: 2023-08-15T10:25:11.726Z INFO postgresql/postgresql.go:421 reloading database configuration
Aug 15 10:25:11 node-5 keeper[1093240]: server signaled
Aug 15 10:25:13 node-5 keeper[2851117]: 2023-08-15T10:25:13.765Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:25:13 node-5 keeper[2851117]: 2023-08-15T10:25:13.765Z INFO cmd/keeper.go:1543 already master
Aug 15 10:25:13 node-5 keeper[2851117]: 2023-08-15T10:25:13.782Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:25:13 node-5 keeper[2851117]: 2023-08-15T10:25:13.782Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:25:15 node-5 keeper[2851117]: 2023-08-15T10:25:15.810Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:25:15 node-5 keeper[2851117]: 2023-08-15T10:25:15.811Z INFO cmd/keeper.go:1543 already master
...
Aug 15 10:30:04 node-5 keeper[2851117]: 2023-08-15T10:30:04.855Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:30:04 node-5 keeper[2851117]: 2023-08-15T10:30:04.855Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:30:06 node-5 keeper[2851117]: 2023-08-15T10:30:06.885Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:06 node-5 keeper[2851117]: 2023-08-15T10:30:06.886Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:06 node-5 keeper[2851117]: 2023-08-15T10:30:06.901Z INFO cmd/keeper.go:1659 needed synchronous_standby_names changed {"prevSyncStandbyNames": "stolonfakestandby", "syncStandbyNames": "stolon_e1e06025"}
Aug 15 10:30:06 node-5 keeper[2851117]: 2023-08-15T10:30:06.901Z INFO cmd/keeper.go:1671 postgres parameters changed, reloading postgres instance
Aug 15 10:30:06 node-5 keeper[2851117]: 2023-08-15T10:30:06.901Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:30:06 node-5 keeper[2851117]: 2023-08-15T10:30:06.901Z INFO postgresql/postgresql.go:421 reloading database configuration
Aug 15 10:30:06 node-5 keeper[1096714]: server signaled
Aug 15 10:30:08 node-5 keeper[2851117]: 2023-08-15T10:30:08.936Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:08 node-5 keeper[2851117]: 2023-08-15T10:30:08.937Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:08 node-5 keeper[2851117]: 2023-08-15T10:30:08.951Z INFO cmd/keeper.go:1659 needed synchronous_standby_names changed {"prevSyncStandbyNames": "stolon_e1e06025", "syncStandbyNames": "2 (stolon_6f2488cf,stolon_e1e06025)"}
Aug 15 10:30:08 node-5 keeper[2851117]: 2023-08-15T10:30:08.951Z INFO cmd/keeper.go:1671 postgres parameters changed, reloading postgres instance
Aug 15 10:30:08 node-5 keeper[2851117]: 2023-08-15T10:30:08.951Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:30:08 node-5 keeper[2851117]: 2023-08-15T10:30:08.951Z INFO postgresql/postgresql.go:421 reloading database configuration
Aug 15 10:30:08 node-5 keeper[1096820]: server signaled
Aug 15 10:30:10 node-5 keeper[2851117]: 2023-08-15T10:30:10.988Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:10 node-5 keeper[2851117]: 2023-08-15T10:30:10.989Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:11 node-5 keeper[2851117]: 2023-08-15T10:30:11.005Z INFO cmd/keeper.go:1659 needed synchronous_standby_names changed {"prevSyncStandbyNames": "2 (stolon_6f2488cf,stolon_e1e06025)", "syncStandbyNames": "stolon_e1e06025"}
Aug 15 10:30:11 node-5 keeper[2851117]: 2023-08-15T10:30:11.005Z INFO cmd/keeper.go:1671 postgres parameters changed, reloading postgres instance
Aug 15 10:30:11 node-5 keeper[2851117]: 2023-08-15T10:30:11.005Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:30:11 node-5 keeper[2851117]: 2023-08-15T10:30:11.005Z INFO postgresql/postgresql.go:421 reloading database configuration
Aug 15 10:30:11 node-5 keeper[1096842]: server signaled
Aug 15 10:30:13 node-5 keeper[2851117]: 2023-08-15T10:30:13.044Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:13 node-5 keeper[2851117]: 2023-08-15T10:30:13.045Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:13 node-5 keeper[2851117]: 2023-08-15T10:30:13.061Z INFO cmd/keeper.go:1659 needed synchronous_standby_names changed {"prevSyncStandbyNames": "stolon_e1e06025", "syncStandbyNames": "2 (stolon_6f2488cf,stolon_e1e06025)"}
Aug 15 10:30:13 node-5 keeper[2851117]: 2023-08-15T10:30:13.061Z INFO cmd/keeper.go:1671 postgres parameters changed, reloading postgres instance
Aug 15 10:30:13 node-5 keeper[2851117]: 2023-08-15T10:30:13.061Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:30:13 node-5 keeper[2851117]: 2023-08-15T10:30:13.061Z INFO postgresql/postgresql.go:421 reloading database configuration
Aug 15 10:30:13 node-5 keeper[1096863]: server signaled
Aug 15 10:30:15 node-5 keeper[2851117]: 2023-08-15T10:30:15.097Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:15 node-5 keeper[2851117]: 2023-08-15T10:30:15.098Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:15 node-5 keeper[2851117]: 2023-08-15T10:30:15.111Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:30:15 node-5 keeper[2851117]: 2023-08-15T10:30:15.111Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:30:17 node-5 keeper[2851117]: 2023-08-15T10:30:17.143Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:17 node-5 keeper[2851117]: 2023-08-15T10:30:17.143Z INFO cmd/keeper.go:1543 already master
...
Aug 15 10:30:31 node-5 keeper[2851117]: 2023-08-15T10:30:31.457Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:31 node-5 keeper[2851117]: 2023-08-15T10:30:31.469Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:30:31 node-5 keeper[2851117]: 2023-08-15T10:30:31.469Z INFO cmd/keeper.go:1703 postgres hba entries not changed
[This is where I issued `systemctl restart stolon-keeper.service`]
Aug 15 10:30:33 node-5 systemd[1]: Stopping stolon-keeper.service...
Aug 15 10:30:33 node-5 keeper[2851117]: 2023-08-15T10:30:33.368Z INFO postgresql/postgresql.go:384 stopping database
Aug 15 10:30:33 node-5 keeper[1097088]: waiting for server to shut down.... done
Aug 15 10:30:33 node-5 keeper[1097088]: server stopped
Aug 15 10:30:33 node-5 systemd[1]: stolon-keeper.service: Deactivated successfully.
Aug 15 10:30:33 node-5 systemd[1]: Stopped stolon-keeper.service.
Aug 15 10:30:33 node-5 systemd[1]: stolon-keeper.service: Consumed 1d 22h 12min 41.405s CPU time, received 270.5G IP traffic, sent 4.4T IP traffic.
Aug 15 10:30:33 node-5 systemd[1]: Started stolon-keeper.service.
Aug 15 10:30:33 node-5 keeper[1097096]: 2023-08-15T10:30:33.663Z WARN cmd/keeper.go:2058 superuser name and replication user name are the same. Different users are suggested.
Aug 15 10:30:33 node-5 keeper[1097096]: 2023-08-15T10:30:33.664Z INFO cmd/keeper.go:2091 exclusive lock on data dir taken
Aug 15 10:30:33 node-5 keeper[1097096]: 2023-08-15T10:30:33.664Z INFO cmd/keeper.go:569 keeper uid {"uid": "node_5"}
Aug 15 10:30:33 node-5 keeper[1097096]: 2023-08-15T10:30:33.669Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:30:33 node-5 keeper[1097096]: 2023-08-15T10:30:33.672Z INFO cmd/keeper.go:1094 our db boot UID is different than the cluster data one, waiting for it to be updated {"bootUUID": "f0a92617-3e63-41f5-bf04-cb1313408fb6", "clusterBootUUID": "def68dec-a5fe-4b09-8561-e952f9bf347d"}
Aug 15 10:30:34 node-5 keeper[1097096]: 2023-08-15T10:30:34.669Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:30:35 node-5 keeper[1097096]: 2023-08-15T10:30:35.669Z ERROR cmd/keeper.go:720 cannot get configured pg parameters {"error": "dial unix /tmp/.s.PGSQL.5432: connect: no such file or directory"}
Aug 15 10:30:35 node-5 keeper[1097096]: 2023-08-15T10:30:35.692Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:35 node-5 keeper[1097096]: 2023-08-15T10:30:35.693Z INFO cmd/keeper.go:1522 not allowing connection as normal users since synchronous replication is enabled and instance was down
Aug 15 10:30:35 node-5 keeper[1097096]: 2023-08-15T10:30:35.700Z INFO postgresql/postgresql.go:319 starting database
Aug 15 10:30:35 node-5 keeper[1097135]: 2023-08-15 10:30:35.707 UTC [1097135] LOG: redirecting log output to logging collector process
Aug 15 10:30:35 node-5 keeper[1097135]: 2023-08-15 10:30:35.707 UTC [1097135] HINT: Future log output will appear in directory "/var/log/postgres".
Aug 15 10:30:35 node-5 keeper[1097096]: 2023-08-15T10:30:35.908Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:35 node-5 keeper[1097096]: 2023-08-15T10:30:35.922Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:30:35 node-5 keeper[1097096]: 2023-08-15T10:30:35.928Z INFO cmd/keeper.go:1689 not allowing connection as normal users since synchronous replication is enabled, instance was down and not all sync standbys are synced
Aug 15 10:30:35 node-5 keeper[1097096]: 2023-08-15T10:30:35.928Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:30:37 node-5 keeper[1097096]: 2023-08-15T10:30:37.968Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:37 node-5 keeper[1097096]: 2023-08-15T10:30:37.969Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:37 node-5 keeper[1097096]: 2023-08-15T10:30:37.984Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:30:37 node-5 keeper[1097096]: 2023-08-15T10:30:37.990Z INFO cmd/keeper.go:1689 not allowing connection as normal users since synchronous replication is enabled, instance was down and not all sync standbys are synced
Aug 15 10:30:37 node-5 keeper[1097096]: 2023-08-15T10:30:37.990Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:30:40 node-5 keeper[1097096]: 2023-08-15T10:30:40.021Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:40 node-5 keeper[1097096]: 2023-08-15T10:30:40.022Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:40 node-5 keeper[1097096]: 2023-08-15T10:30:40.036Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:30:40 node-5 keeper[1097096]: 2023-08-15T10:30:40.043Z INFO cmd/keeper.go:1698 postgres hba entries changed, reloading postgres instance
Aug 15 10:30:40 node-5 keeper[1097096]: 2023-08-15T10:30:40.043Z INFO postgresql/postgresql.go:421 reloading database configuration
Aug 15 10:30:40 node-5 keeper[1097497]: server signaled
Aug 15 10:30:42 node-5 keeper[1097096]: 2023-08-15T10:30:42.079Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:42 node-5 keeper[1097096]: 2023-08-15T10:30:42.079Z INFO cmd/keeper.go:1543 already master
Aug 15 10:30:42 node-5 keeper[1097096]: 2023-08-15T10:30:42.095Z INFO cmd/keeper.go:1676 postgres parameters not changed
Aug 15 10:30:42 node-5 keeper[1097096]: 2023-08-15T10:30:42.096Z INFO cmd/keeper.go:1703 postgres hba entries not changed
Aug 15 10:30:44 node-5 keeper[1097096]: 2023-08-15T10:30:44.124Z INFO cmd/keeper.go:1505 our db requested role is master
Aug 15 10:30:44 node-5 keeper[1097096]: 2023-08-15T10:30:44.125Z INFO cmd/keeper.go:1543 already master
Environment
- Stolon version: Stolon
mastercommit 4bb4107