-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions trying to im;le #226
Comments
Almost a year ago, I was looking at implementing automatic failover in a PostgreSQL cluster. I got repmgr automatic failover up and running first so we went with that instead of PAF, but now I have a Window to look at switching back to PAF if it gives us more features. Here is what I have so far (I deploy this via ansible playbooks):
We are stuck using PostgreSQL 9.6 for this project. I currently deploy in my test lab 2 Clusters of 2 nodes each (via ansible)
Currently (with repmgr), if the master fails, we get:
We also run a 'sanity_check' script every 5 minutes that verifies everything is running where it should be and send SNMP traps if it isn't We also wrote some SNMP extensions to gather some Pacemaker/PostgreSQL metrics every SNMP polling session I have this working, but it only is one way, i.e., if the Master fails, everything toggles properly to the standby server, but it does not fail back to the master if the standby then fails. So we get alerted immediately after a failover so we can manually put the master back online (as a new standby). Would switching over to PAF give me a more robust failover mecanism? Currently with PAF, I have this: PCS status gives me:
But switching to the postgres user and getting cluster info gives me:
What steps would this point to that I have setup wrong? I should also note we perform streaming replication on a second/dedicated network interface with the same DNS name as the main node but with -LL added, so for MTL03VLTDB-CDR52, it replicates over MTL03VLTDB-CDR52-LL |
Hi, I'm not familiar with repmgr commands anymore (and I never been really comfortable with them anyway). I'm not sure what your cluster status command is really doing, neither what However, I can tell you the standby is not following the primary anymore. It seems stuck on timeline 1 where the primary is on timeline 3 ( If you already use Pacemaker, I advise moving away from In either case, you don't need repmgr anymore. I would rather advice relying on pgbackrest or barman to deal with your PITR backups and fast standby resynchronisation. PAF does not support auto-failback by design. Lastly, make sure to use fencing and/or watchdog and/or SBD (poison pills). |
No description provided.
The text was updated successfully, but these errors were encountered: