11# pgsql-cluster-manager [ ![ CircleCI] ( https://circleci.com/gh/gocardless/pgsql-cluster-manager.svg?style=svg&circle-token=38c8f4dc817216aa6a02b3bf67435fe2f1d72189 )] ( https://circleci.com/gh/gocardless/pgsql-cluster-manager )
22
3- https://paper.dropbox.com/doc/Postgres-Clustering-V2-d9N8n4cWuXZPeyTdeEpXw
3+ ` pgsql-cluster-manager ` extends a standard highly-available Postgres setup
4+ (managed by [ Corosync] ( http://corosync.github.io/ ) and
5+ [ Pacemaker] ( http://www.linux-ha.org/wiki/Pacemaker ) ) enabling its use in cloud
6+ environments where using using floating IPs to denote the primary node is
7+ difficult or impossible. In addition, ` pgsql-cluster-manager ` provides the
8+ ability to run zero-downtime migrations of the Postgres primary with a simple
9+ API trigger.
410
5- ## PGBouncer config
11+ See [ Playground] ( #playground ) for how to start a Dockerised three node Postgres
12+ cluster with ` pgsql-cluster-manager ` .
613
7- We use lib/pq to connect to PGBouncer over the unix socket. Unfortunately lib/pq
8- has issues when first establishing a connection to PGBouncer as it attempts to
9- set the configuration parameters ` extra_float_digits ` , which PGBouncer doesn't
14+ - [ Overview] ( #overview )
15+ - [ Playground] ( #playground )
16+ - [ Node Roles] ( #node-roles )
17+ - [ Postgres Nodes] ( #postgres-nodes )
18+ - [ App Nodes] ( #app-nodes )
19+ - [ Zero-Downtime Migrations] ( #zero-downtime-migrations )
20+ - [ Configuration] ( #configuration )
21+ - [ Pacemaker] ( #pacemaker )
22+ - [ PgBouncer] ( #pgbouncer )
23+ - [ Development] ( #development )
24+ - [ CircleCI] ( #circleci )
25+ - [ Releasing] ( #releasing )
26+
27+ ## Overview
28+
29+ GoCardless runs a highly available Postgres cluster using
30+ [ Corosync] ( http://corosync.github.io/ ) and
31+ [ Pacemaker] ( http://www.linux-ha.org/wiki/Pacemaker ) . Corosync provides an
32+ underlying quorum mechanism while pacemaker provides the ability to register
33+ plugins that can manage arbitrary services, detecting and recovering from node
34+ and service-level failures.
35+
36+ The typical Postgres setup with Corosync & Pacemaker uses a floating IP attached
37+ to the Postgres primary node. Clients connect to this IP, and during failover
38+ the IP is moved to the new primary. Managing portable IPs in Cloud providers
39+ such as AWS and GCP is more difficult than a classic data center, and so we
40+ built ` pgsql-cluster-manager ` to adapt our cluster for these environments.
41+
42+ ` pgsql-cluster-manager ` makes use of [ etcd] ( https://github.com/coreos/etcd ) to
43+ store cluster configuration, which can then be used by clients to connect to the
44+ appropriate node. We can view ` pgsql-cluster-manager ` as three distinct services
45+ which each conceptually 'manage' different components:
46+
47+ - ` cluster ` extracts cluster state from pacemaker and pushes to etcd
48+ - ` proxy ` ensures our Postgres proxy (PgBouncer) is reloaded with the current
49+ primary IP
50+ - ` migration ` controls a zero-downtime migration flow
51+
52+ ### Playground
53+
54+ We have created a Dockerised sandbox environment that boots a three node
55+ Postgres cluster with the ` pgsql-cluster-manager ` services installed. We
56+ strongly recommend playing around in this environment to develop an
57+ understanding of how this setup works and to simulate failure situations
58+ (network partitions, node crashes, etc).
59+
60+ ** It also helps to have this playground running while reading through the README,
61+ in order to try out the commands you see along the way.**
62+
63+ First install [ Docker] ( https://docker.io/ ) and Golang >=1.9, then run:
64+
65+ ```
66+ # Clone into your GOPATH
67+ $ git clone https://github.com/gocardless/pgsql-cluster-manager
68+ $ cd pgsql-cluster-manager
69+ $ make build-linux
70+
71+ $ cd docker/postgres-member && ./start
72+ Sending build context to Docker daemon 4.332 MB
73+ Step 1/16 : FROM gocardless/pgsql-cluster-manager
74+ ...
75+
76+ root@pg01:/# crm_mon -Afr -1
77+
78+ Node Attributes:
79+ * Node pg01:
80+ + Postgresql-data-status : STREAMING|SYNC
81+ + Postgresql-status : HS:sync
82+ + master-Postgresql : 100
83+ * Node pg02:
84+ + Postgresql-data-status : STREAMING|POTENTIAL
85+ + Postgresql-status : HS:potential
86+ + master-Postgresql : -INFINITY
87+ * Node pg03:
88+ + Postgresql-data-status : LATEST
89+ + Postgresql-master-baseline : 0000000002000090
90+ + Postgresql-status : PRI
91+ + master-Postgresql : 1000
92+
93+ root@pg01:/# ping pg03 -c1 | head -n1
94+ PING pg03 (172.17.0.4) 56(84) bytes of data.
95+
96+ root@pg01:/# ETCDCTL_API=3 etcdctl get --prefix /
97+ /postgres/master
98+ 172.17.0.4
99+ ```
100+
101+ The [ start] ( docker/postgres-member/start ) script will boot three Postgres nodes
102+ with the appropriate configuration, and will start a full Postgres cluster. The
103+ script (for convenience) will enter you into a docker shell in ` pg01 ` .
104+ Connecting to any of the other containers can be achieved with `docker exec -it
105+ pg0X /bin/bash`.
106+
107+ ### Node Roles
108+
109+ The ` pgsql-cluster-manager ` services are expected to run on two types of
110+ machine: the nodes that are members of the Postgres cluster, and the machines
111+ that will host applications which will connect to the cluster.
112+
113+ ![ Two node types, Postgres and App machines] ( res/node_roles.svg )
114+
115+ To explain how this setup works, we'll use an example of three machines (` pg01 ` ,
116+ ` pg02 ` , ` pg03 ` ) to run the Postgres cluster and one machine (` app01 ` ) to run our
117+ client application. To match a typical production environment, let's imagine we
118+ want to run a docker container on ` app01 ` and have that container connect to our
119+ Postgres cluster, while being resilient to Postgres failover.
120+
121+ It's worth noting that our playground configures only nodes of the Postgres
122+ type, as this is sufficient to test out and play with the cluster. In production
123+ you'd run app nodes so that applications can connect to the local PgBouncer,
124+ which in turn knows how to route to the primary.
125+
126+ For playing around, it's totally fine to connect to one of the cluster nodes
127+ PgBouncers directly from your host machine.
128+
129+ #### Postgres Nodes
130+
131+ In this hypothetical world we've provisioned our Postgres boxes with corosync,
132+ pacemaker and Postgres, and additionally the following services:
133+
134+ - [ PgBouncer] ( https://pgbouncer.github.io/ ) for connection pooling and proxying
135+ to the current primary
136+ - [ etcd] ( https://github.com/coreos/etcd ) as a queryable store of cluster state,
137+ connecting to provide a three node etcd cluster
138+
139+ We then run the ` cluster ` service as a daemon, which will continually query
140+ pacemaker to pull the current Postgres primary IP address and push this value to
141+ etcd. Once we're pushing this value to etcd, we can use the ` proxy ` service to
142+ subscribe to changes and update the local PgBouncer with the new value. We do
143+ this by provisioning a PgBouncer [ configuration template file] (
144+ docker/postgres-member/pgbouncer/pgbouncer.ini.template)
145+ that looks like the following:
146+
147+ ```
148+ # /etc/pgbouncer/pgbouncer.ini.template
149+
150+ [databases]
151+ postgres = host={{.Host}} pool_size=10
152+ ```
153+
154+ Whenever the ` cluster ` service pushes a new IP address to etcd, the ` proxy `
155+ service will render this template and replace any ` {{.Host}} ` placeholder with
156+ the latest Postgres primary IP address, finally reloading PgBouncer to direct
157+ connections at the new primary.
158+
159+ We can verify that ` cluster ` is pushing the IP address by using ` etcdctl ` to
160+ inspect the contents of our etcd cluster. We should find the current Postgres
161+ primary IP address has been pushed to the key we have configured for
162+ ` pgsql-cluster-manager `
163+
164+ ```
165+ root@pg01:/$ ETCDCTL_API=3 etcdctl get --prefix /
166+ /postgres/master
167+ 172.17.0.2
168+ ```
169+
170+ #### App Nodes
171+
172+ We now have the Postgres nodes running PgBouncer proxies that live-update their
173+ configuration to point connections to the latest Postgres primary. Our aim is
174+ now to have app clients inside docker containers to connect to our Postgres
175+ cluster without having to introduce routing decisions into the client code.
176+
177+ To do this, we install PgBouncer onto ` app01 ` and bind to the host's private
178+ interface. We then allow traffic from the docker network interface to the
179+ private interface on the host, so that containers can communicate with the
180+ PgBouncer on the host.
181+
182+ Finally we configure ` app01 ` 's PgBouncer with a configuration template as we did
183+ with the Postgres machines, and run the ` proxy ` service to continually update
184+ PgBouncer to point at the latest primary. Containers then connect via the docker
185+ host IP to PgBouncer, which will transparently direct connections to the correct
186+ Postgres node.
187+
188+ ``` sh
189+ root@app01:/$ cat < EOF > /etc/pgbouncer/pgbouncer.ini.template
190+ [databases]
191+ postgres = host={{.Host}}
192+ EOF
193+
194+ root@app01:/$ service pgsql-cluster-manager-proxy start
195+ pgsql-cluster-manager-proxy start/running, process 6997
196+
197+ root@app01:/$ service pgbouncer start
198+ * Starting PgBouncer pgbouncer
199+ ...done.
200+
201+ root@app01:/$ tail /var/log/pgsql-cluster-manager/proxy.log | grep HostChanger
202+ {" handler" :" *pgbouncer.HostChanger" ," key" :" /master" ," level" :" info" ," message" :" Triggering handler with initial etcd key value" ," timestamp" :" 2017-12-03T17:49:03+0000" ," value" :" 172.17.0.2" }
203+
204+ root@app01:/$ tail /var/log/postgresql/pgbouncer.log | grep " RELOAD"
205+ 2017-12-03 17:49:03.167 16888 LOG RELOAD command issued
206+
207+ # Attempt to connect via the docker bridge IP
208+ root@app01:/$ docker run -it --rm jbergknoff/postgresql-client postgresql://
[email protected] :6432/postgres
209+ Password:
210+ psql (9.6.5, server 9.4.14)
211+ Type " help" for help.
212+
213+ postgres=#
214+ ```
215+
216+ ### Zero-Downtime Migrations
217+
218+ It's inevitable over the lifetime of a database cluster that machines will need
219+ upgrading, and services restarting. It's not acceptable for such routine tasks
220+ to require downtime, so ` pgsql-cluster-manager ` provides an API to trigger
221+ migrations of the Postgres primary without disrupting database clients.
222+
223+ This API is served by the supervise ` migration ` service, which should be run on
224+ all the Postgres nodes participating in the cluster. It's important to note that
225+ this flow is only supported when all database clients are using PgBouncer
226+ transaction pools in order to support pausing connections. Any clients that use
227+ session pools will need to be turned off for the duration of the migration.
228+
229+ 1 . Acquire lock in etcd (ensuring only one migration takes place at a time)
230+ 2 . Pause all PgBouncer pools on Postgres nodes
231+ 3 . Instruct Pacemaker to perform migration of primary to sync node
232+ 4 . Once the sync node is serving traffic as a primary, resume PgBouncer pools
233+ 5 . Release etcd lock
234+
235+ As the primary moves machine, the supervise ` cluster ` service will push the new
236+ IP address to etcd. The supervise ` proxy ` services running in the Postgres and
237+ App nodes will detect this change and update PgBouncer to point at the new
238+ primary IP, while the migration flow will detect this change in step (4) and
239+ resume PgBouncer to allow queries to start once more.
240+
241+ ```
242+ root@pg01:/$ pgsql-cluster-manager --config-file /etc/pgsql-cluster-manager/config.toml migrate
243+ INFO[0000] Loaded config configFile=/etc/pgsql-cluster-manager/config.toml
244+ INFO[0000] Health checking clients
245+ INFO[0000] Acquiring etcd migration lock
246+ INFO[0000] Pausing all clients
247+ INFO[0000] Running crm resource migrate
248+ INFO[0000] Watching for etcd key to update with master IP address key=/master target=172.17.0.2
249+ INFO[0006] Successfully migrated! master=pg01
250+ INFO[0006] Running crm resource unmigrate
251+ INFO[0007] Releasing etcd migration lock
252+ ```
253+
254+ This flow is subject to several timeouts that should be tuned to match your
255+ pacemaker cluster settings. See ` pgsql-cluster-manager migrate --help ` for an
256+ explanation of each timeout and how it affects the migration. This flow can be
257+ run from anywhere that has access to the etcd and Postgres migration API.
258+
259+ The Postgres node that was originally the primary is now turned off, and won't
260+ rejoin the cluster until the lockfile is removed. You can bring the node back
261+ into the cluster by doing the following:
262+
263+ ```
264+ root@pg02:/$ rm /var/lib/postgresql/9.4/tmp/PGSQL.lock
265+ root@pg02:/$ crm resource cleanup msPostgresql
266+ ```
267+
268+ ## Configuration
269+
270+ We recommand configuring ` pgsql-cluster-manager ` using a TOML configuration
271+ file. You can generate a sample configuration file with the default values for
272+ each paramter by running the following:
273+
274+ ```
275+ $ pgsql-cluster-manager show-config >/etc/pgsql-cluster-manager/config.toml
276+ ```
277+
278+ ### Pacemaker
279+
280+ The test environment is a good basis for configuring pacemaker with the pgsql
281+ resource agent, and gives an example of cluster configuration that will
282+ bootstrap a Postgres cluster.
283+
284+ We load pacemaker configuration in tests from the ` configure_pacemaker ` function
285+ in [ start-cluster.bash] ( docker/postgres-member/start-cluster.bash ) , though we
286+ advise thinking carefully about what appropriate timeouts might be for your
287+ setup.
288+
289+ The [ pgsql] ( docker/postgres-member/resource_agents/pgsql ) resource agent has
290+ been modified to remove the concept of a primary floating IP. Anyone looking to
291+ use this cluster without a floating IP will need to use the modified agent from
292+ this repo, which renders the primary's actual IP directly into Postgres'
293+ ` recovery.conf ` and reboots database replicas when the primary changes
294+ (required, given Postgres cannot live reload ` recovery.conf ` changes).
295+
296+ ### PgBouncer
297+
298+ We use [ lib/pq] ( https://github.com/lib/pq ) to connect to PgBouncer over the unix
299+ socket. Unfortunately lib/pq has [ issues] ( https://github.com/lib/pq/issues/475 )
300+ when first establishing a connection to PgBouncer as it attempts to set the
301+ configuration parameters ` extra_float_digits ` , which PgBouncer doesn't
10302recognise, and therefore will reject the connection.
11303
12304To avoid this, make sure all configuration templates include the following:
@@ -16,14 +308,16 @@ To avoid this, make sure all configuration templates include the following:
16308...
17309
18310# Connecting using the golang lib/pq wrapper requires that we ignore
19- # the 'extra_float_digits' startup parameter, otherwise PGBouncer will
311+ # the 'extra_float_digits' startup parameter, otherwise PgBouncer will
20312# close the connection.
21313#
22314# https://github.com/lib/pq/issues/475
23315ignore_startup_parameters = extra_float_digits
24316```
25317
26- ## CircleCI
318+ ## Development
319+
320+ ### CircleCI
27321
28322We build a custom Docker image for CircleCI builds that is hosted at
29323gocardless/pgsql-cluster-manager-circleci on Docker Hub. The Dockerfile lives at
@@ -35,10 +329,10 @@ To publish a new version of the Docker image, run:
35329make publish-circleci-dockerfile
36330```
37331
38- ## Releasing
332+ ### Releasing
39333
40334We use [ goreleaser] ( https://github.com/goreleaser/goreleaser ) to create releases
41- for pgsql-cluster-manager. This enables us to effortlessly create new releases
335+ for ` pgsql-cluster-manager ` . This enables us to effortlessly create new releases
42336with all associated artifacts to various destinations, such as GitHub and
43337homebrew taps.
44338
0 commit comments