pacemaker+corosync 搭建一主两从PG集群-摩杜云开发者社区

一、OS配置（所有节点）

1.1 环境规划

[root@node1 ~]# cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core) 
[root@node1 ~]# uname -a
Linux node1 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

数据库版本 postgresql-10.5

node1  192.0.2.11
node2  192.0.2.12
node3  192.0.2.13
vip-master  192.0.2.101
vip-slave   192.0.2.102

1.2 系统配置

关闭selinux   ----> /etc/selinux/config
关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service

关闭NetworkManager
systemctl stop NetworkManager.service
systemctl disable NetworkManager.service

配置网卡文件  ----> /etc/sysconfig/network-scripts/ifcfg-xxx

修改主机名  hostnamectl set-hostname nodeX
编辑 /etc/hosts 文件
192.0.2.11  node1
192.0.2.12  node2
192.0.2.13  node3

配置本地yum源
[base]
name=Base
baseurl=file:///mnt
enabled=1
gpgcheck=0

二、安装和配置PostgreSQL软件

2.1 所有节点安装PG软件

以node1为例
1）安装依赖包

[root@node1 ~]# yum -y install  gcc gcc-c++ flex bison readline-devel zlib-devel

2）创建用户

[root@node1 ~]# groupadd postgres
[root@node1 ~]# useradd postgres -g postgres
[root@node1 ~]# echo "pg123"|passwd postgres --stdin

3）编译安装PG软件

[root@node1 ~]# tar -zxvf postgresql-10.5.tar.gz
[root@node1 ~]# cd postgresql-10.5
[root@node1 postgresql-10.5]# ./configure
[root@node1 postgresql-10.5]# make
[root@node1 postgresql-10.5]# make install
[root@node2 postgresql-10.5]# cd contrib
[root@node1 contrib]# make
[root@node1 contrib]# make install

[root@node1 ~]# chown -R postgres: /usr/local/pgsql

[root@node1 ~]# mkdir -p /var/lib/pgsql/{data,pg_archive}
[root@node1 ~]# chown -R postgres: /var/lib/pgsql
[root@node1 ~]# chmod 0700 /var/lib/pgsql/data

2.2 node1 初始化数据库

1）初始化数据库

[postgres@node1 ~]$ cd /usr/local/pgsql/bin/
[postgres@node1 bin]$ ./initdb -D /var/lib/pgsql/data

2）修改配置文件

[postgres@node1 ~]$ cd /var/lib/pgsql/data
[postgres@node1 data]$ vim postgresql.conf
listen_addresses = '*'
wal_level = hot_standby
synchronous_commit = on
archive_mode = on
archive_command = 'cp %p /var/lib/pgsql/pg_archive/%f'
max_wal_senders=5
wal_keep_segments = 32
hot_standby = on
restart_after_crash = off
wal_receiver_status_interval = 2
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
synchronous_commit = on
restart_after_crash = off
hot_standby_feedback = on
hot_standby_feedback = on
logging_collector = on
log_filename = 'postgresql-%d.log'
log_rotation_age = 1d
log_truncate_on_rotation = on

[postgres@node1 data]$ vim pg_hba.conf
host    all             all             192.0.2.0/24            trust
host    replication     all             192.0.2.0/24            trust

3）配置用户 profile 文件

[postgres@node1 ~]$ vim .bash_profile 
export PATH=/usr/local/pgsql/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/pgsql/lib
export PGDATA=/var/lib/pgsql/data

4）启动数据库

[postgres@node1 ~]$ pg_ctl -D /var/lib/pgsql/data start

2.3 node2/node3 创建备库

1）以 node2 为例，先配置用户 profile 文件

[postgres@node2 ~]$ vim .bash_profile 
export PATH=/usr/local/pgsql/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/pgsql/lib
export PGDATA=/var/lib/pgsql/data
[postgres@node2 ~]$ pg_basebackup -h  192.0.2.11 -U postgres -D /var/lib/pgsql/data -X stream -P

2）配置recovery.conf 文件

[postgres@node2 ~]$ vim /var/lib/pgsql/data/recovery.conf 
standby_mode = 'on'
primary_conninfo = 'host=192.0.2.11 port=5432 user=postgres application_name=node2'
restore_command = 'cp /var/lib/pgsql/pg_archive/%f %p'
recovery_target_timeline = 'latest'

3）启动备库

[postgres@node2 ~]$ pg_ctl -D /var/lib/pgsql/data start

2.4 node1 查看集群状态

postgres=# select application_name,client_addr,sync_state from pg_stat_replication;
 application_name | client_addr | sync_state 
------------------+-------------+------------
 node2            | 192.0.2.12  | async
 node3            | 192.0.2.13  | async
(2 rows)

2.5 所有节点停止 PG服务

先备库后主库

$ pg_ctl -D /var/lib/pgsql/data stop

三、配置corosync

1）所有节点安装软件

# yum -y install pacemaker corosync pcs

2）配置 corosync.conf文件

[root@node1 ~]# cd /etc/corosync
[root@node1 corosync]# vim corosync.conf
quorum {
    provider: corosync_votequorum
    expected_votes: 3
}
aisexec {
    user: root
    group: root
}
service {
    name: pacemaker
    ver: 0
}
totem {
    version: 2
    secauth: off
    interface {
        ringnumber: 0
        bindnetaddr: 192.0.2.0
        mcastaddr: 239.255.1.1
    }
}
logging {
    to_syslog: yes
}

[root@node1 corosync]# scp corosync.conf 192.0.2.12:/etc/corosync
[root@node1 corosync]# scp corosync.conf 192.0.2.13:/etc/corosync

3）所有节点启动corosync服务

# systemctl start corosync.service

四、配置 pacemaker

1）所有节点启动 pacemaker服务

# systemctl start pacemaker.service

2）任意节点使用root用户检查状态

[root@node1 ~]# crm_mon -Afr -1
Stack: corosync
Current DC: node2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Fri Aug 14 09:36:38 2020
Last change: Fri Aug 14 09:35:11 2020 by hacluster via crmd on node2

3 nodes configured
0 resources configured

Online: [ node1 node2 node3 ]

No resources


Node Attributes:
* Node node1:
* Node node2:
* Node node3:

Migration Summary:
* Node node1:
* Node node3:
* Node node2:

3）node1 编辑config.pcs文件

[root@node1 ~]# cd /var/lib/pacemaker/cib/
[root@node1 cib]# vim config.pcs
pcs cluster cib pgsql_cfg

pcs -f pgsql_cfg property set no-quorum-policy="ignore"
pcs -f pgsql_cfg property set stonith-enabled="false"
pcs -f pgsql_cfg resource defaults resource-stickiness="INFINITY"
pcs -f pgsql_cfg resource defaults migration-threshold="1"

pcs -f pgsql_cfg resource create vip-master IPaddr2 \
   ip="192.0.2.101" \
   nic="ens33" \
   cidr_netmask="24" \
   op start   timeout="60s" interval="0s"  on-fail="restart" \
   op monitor timeout="60s" interval="10s" on-fail="restart" \
   op stop    timeout="60s" interval="0s"  on-fail="block"

pcs -f pgsql_cfg resource create vip-slave IPaddr2 \
   ip="192.0.2.102" \
   nic="ens33" \
   cidr_netmask="24" \
   meta migration-threshold="0" \
   op start   timeout="60s" interval="0s"  on-fail="stop" \
   op monitor timeout="60s" interval="10s" on-fail="restart" \
   op stop    timeout="60s" interval="0s"  on-fail="ignore"

pcs -f pgsql_cfg resource create pgsql pgsql \
   pgctl="/usr/local/pgsql/bin/pg_ctl" \
   psql="/usr/local/pgsql/bin/psql" \
   pgdata="/var/lib/pgsql/data/" \
   rep_mode="sync" \
   node_list="node1 node2 node3" \
   restore_command="cp /var/lib/pgsql/pg_archive/%f %p" \
   primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" \
   master_ip="192.0.2.101" \
   restart_on_promote='true' \
   op start   timeout="60s" interval="0s"  on-fail="restart" \
   op monitor timeout="60s" interval="4s" on-fail="restart" \
   op monitor timeout="60s" interval="3s"  on-fail="restart" role="Master" \
   op promote timeout="60s" interval="0s"  on-fail="restart" \
   op demote  timeout="60s" interval="0s"  on-fail="stop" \
   op stop    timeout="60s" interval="0s"  on-fail="block" \
   op notify  timeout="60s" interval="0s"

pcs -f pgsql_cfg resource master msPostgresql pgsql \
   master-max=1 master-node-max=1 clone-max=3 clone-node-max=1 notify=true

pcs -f pgsql_cfg resource group add master-group vip-master
pcs -f pgsql_cfg resource group add slave-group vip-slave

pcs -f pgsql_cfg constraint colocation add master-group with Master msPostgresql INFINITY
pcs -f pgsql_cfg constraint order promote msPostgresql then start master-group symmetrical=false score=INFINITY
pcs -f pgsql_cfg constraint order demote  msPostgresql then stop  master-group symmetrical=false score=0

pcs -f pgsql_cfg constraint colocation add slave-group  with Slave  msPostgresql INFINITY         
pcs -f pgsql_cfg constraint order promote msPostgresql then start slave-group symmetrical=false score=INFINITY                               
pcs -f pgsql_cfg constraint order demote  msPostgresql then stop  slave-group symmetrical=false score=0                      

pcs cluster cib-push pgsql_cfg

4）所有节点增加 root用户的 PATH路径

[root@node1 ~]# vim .bash_profile 
export PATH=/usr/local/pgsql/bin:$PATH
[root@node1 ~]# source .bash_profile

5）node1 加载配置

[root@node1 cib]# sh config.pcs 
Warning: Defaults do not apply to resources which override them with their own defined values
Warning: Defaults do not apply to resources which override them with their own defined values
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')
Assumed agent name 'ocf:heartbeat:pgsql' (deduced from 'pgsql')
Adding msPostgresql master-group (score: INFINITY) (Options: first-action=promote then-action=start symmetrical=false)
Adding msPostgresql master-group (score: 0) (Options: first-action=demote then-action=stop symmetrical=false)
Adding msPostgresql slave-group (score: INFINITY) (Options: first-action=promote then-action=start symmetrical=false)
Adding msPostgresql slave-group (score: 0) (Options: first-action=demote then-action=stop symmetrical=false)
CIB updated

6）等待一会，再次检查状态

[root@node1 cib]# crm_mon -Afr -1
Stack: corosync
Current DC: node2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Fri Aug 14 09:42:57 2020
Last change: Fri Aug 14 09:42:56 2020 by root via crm_attribute on node1

3 nodes configured
5 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 node3 ]
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2):   Started node1
 Resource Group: slave-group
     vip-slave  (ocf::heartbeat:IPaddr2):   Started node2

Node Attributes:
* Node node1:
    + master-pgsql                      : 1000      
    + pgsql-data-status                 : LATEST    
    + pgsql-master-baseline             : 0000000006000098
    + pgsql-status                      : PRI       
* Node node2:
    + master-pgsql                      : 100       
    + pgsql-data-status                 : STREAMING|SYNC
    + pgsql-status                      : HS:sync   
* Node node3:
    + master-pgsql                      : -INFINITY 
    + pgsql-data-status                 : STREAMING|ASYNC
    + pgsql-status                      : HS:async  

Migration Summary:
* Node node1:
* Node node3:
* Node node2:

7）任意节点使用 pcs命令检查状态

[root@node1 ~]# pcs status
Cluster name: 
Stack: corosync
Current DC: node2 (version 1.1.19-8.el7-c3c624ea3d) - partition with quorum
Last updated: Fri Aug 14 10:12:30 2020
Last change: Fri Aug 14 09:42:56 2020 by root via crm_attribute on node1

3 nodes configured
5 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 Master/Slave Set: msPostgresql [pgsql]
     Masters: [ node1 ]
     Slaves: [ node2 node3 ]
 Resource Group: master-group
     vip-master (ocf::heartbeat:IPaddr2):   Started node1
 Resource Group: slave-group
     vip-slave  (ocf::heartbeat:IPaddr2):   Started node2

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: inactive/disabled

8）在 node1(主库)查看状态

postgres=# select application_name,client_addr,sync_state from pg_stat_replication;
 application_name | client_addr | sync_state 
------------------+-------------+------------
 node3            | 192.0.2.13  | async
 node2            | 192.0.2.12  | sync
(2 rows)