MySQL高可用MMM
MySQL高可用MMM
环境:
角色 | ip | Server-id | Write vip | Read vip |
---|---|---|---|---|
master1 | 192.168.1.7 | 1 | 192.168.1.100 | |
master2 | 192.168.1.8 | 2 | 192.168.1.101 | |
slave1 | 192.168.1.10 | 3 | 192.168.1.102 | |
slave2 | 192.168.1.12 | 4 | 192.168.1.103 | |
monitor | 192.168.1.13 | 无 |
一、环境部署
①部署perl环境(所有主机)
[root@192 ~]# yum -y install perl-* libart_lgpl.x86_64 rrdtool.x86_64 rrdtool-perl.x86_64
MMM架构基于perl环境实现,务必安装perl所有环境插件。如果yum报错,请认真排查,推荐yum remove -y libvirt-client,之后重新yum安装
②安装相关插件库(所有主机)
[root@192 ~]# cpan -i Algorithm::Diff Class::Singleton DBI DBD::mysql Log::Dispatch Log::Log4perl Mail::Send Net::Ping Proc::Daemon Time::HiRes Params::Validate Net::ARP
提示后,回车继续
③关闭防火墙或者开启端口,关闭selinx,修改相应主机名
mmm_agent:代理端口号为9989
mmm_monitor:监控端口为9988
[root@192 ~]# hostnamectl set-hostname master1
[root@192 ~]# hostnamectl set-hostname master2
[root@192 ~]# hostnamectl set-hostname slave1
[root@192 ~]# hostnamectl set-hostname slave2
二、配置主从
主从环境:
master2,slave1,slave2都是master1的slave库
master1是master2的slave库
①master1:
[root@master1 ~]# cat /etc/my.cnf
[mysqld] basedir = /usr/local/mysql datadir = /usr/local/mysql/data port = 3306 server_id = 1 socket = /usr/local/mysql/mysql.sock log-error=/usr/local/mysql/data/mysqld.err log-bin = mysql-bin relay-log = relay-bin relay-log-index = slave-relay-bin.index log-slave-updates = 1 auto-increment-increment = 2 auto-increment-offset = 1 binlog_format = mixed [client] host = 127.0.0.1 user = root password = 123.com
②master2:
[root@master2 ~]# cat /etc/my.cnf
[mysqld] basedir = /usr/local/mysql datadir = /usr/local/mysql/data port = 3306 server_id = 2 socket = /usr/local/mysql/mysql.sock log-error=/usr/local/mysql/data/mysqld.err log-bin = mysql-bin relay-log = relay-bin relay-log-index = slave-relay-bin.index log-slave-updates = 1 auto-increment-increment = 2 auto-increment-offset = 2 binlog_format = mixed [client] host = 127.0.0.1 user = root password = 123.com
③slave1:
[root@slave1 ~]# cat /etc/my.cnf
[mysqld] basedir = /usr/local/mysql datadir = /usr/local/mysql/data port = 3306 server_id = 3 socket = /usr/local/mysql/mysql.sock log-error=/usr/local/mysql/data/mysqld.err relay-log = relay-bin relay-log-index = slave-relay-bin.index [client] host = 127.0.0.1 user = root password = 123.com
④slave2:
[root@slave2 ~]# cat /etc/my.cnf
[mysqld] basedir = /usr/local/mysql datadir = /usr/local/mysql/data port = 3306 server_id = 4 socket = /usr/local/mysql/mysql.sock log-error=/usr/local/mysql/data/mysqld.err relay-log = relay-bin relay-log-index = slave-relay-bin.index [client] host = 127.0.0.1 user = root password = 123.com
⑤权限与change
master1:
mysql> grant replication slave on . to myslave@'%' identified by '123.com';
mysql> show master status \G
*************************** 1. row *************************** File: mysql-bin.000001 Position: 154 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set:
master2:
mysql> grant replication slave on . to myslave@'%' identified by '123.com';
mysql> show master status \G
*************************** 1. row *************************** File: mysql-bin.000001 Position: 154 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set:
mysql> change master to master_host='192.168.1.7',master_user='myslave',master_password='123.com',master_log_file=’mysql-bin.000001’,master_log_pos=154;
mysql> start slave;
slave1:
mysql> change master to master_host='192.168.1.7',master_user='myslave',master_password='123.com',master_log_file=’mysql-bin.000001’,master_log_pos=154;
mysql> start slave;
slave2:
mysql> change master to master_host='192.168.1.7',master_user='myslave',master_password='123.com',master_log_file=’mysql-bin.000001’,master_log_pos=154;
mysql> start slave;
master1:
mysql> change master to master_host='192.168.1.8',master_user='myslave',master_password='123.com',master_log_file=’mysql-bin.000001’,master_log_pos=154;
mysql> start slave;
三、配置MMM
代理守护进程(四台DB主机)步骤:
①配置监控用户权限(主从复制只在master-1执行即可)
mysql> grant super,replication client,process on . to 'mmm_agent'@'%' identified by '123.com';
mysql> grant replication client on . to 'mmm_monitor'@'%' identified by '123.com';
mysql> select user,host from mysql.user where user in ('mmm_monitor','mmm_agent');
user | host |
---|---|
mmm_agent | % |
mmm_monitor | % |
②安装MMM(monitor主机与四台主从主机都需要安装)
[root@master1 ~]# wget http://mysql-mmm.org/_media/:mmm2:mysql-mmm-2.2.1.tar.gz
[root@master1 ~]# tar zxf :mmm2:mysql-mmm-2.2.1.tar.gz
[root@master1 ~]# cd mysql-mmm-2.2.1/
[root@master1 mysql-mmm-2.2.1]# make && make install
③配置mmm_common.conf(monitor主机与四台主从主机务必一致)
[root@master1 mysql-mmm]# vim mmm_common.conf
active_master_role writer <host default> cluster_interface eno16777736 pid_path /var/run/mmm_agentd.pid bin_path /usr/lib/mysql-mmm/ replication_user myslave replication_password 123.com agent_user mmm_agent agent_password 123.com </host> <host master1> ip 192.168.1.7 mode master peer master2 </host> <host master2> ip 192.168.1.8 mode master peer master1 </host> <host slave1> ip 192.168.1.10 mode slave </host> <host slave2> ip 192.168.1.12 mode slave </host> <role writer> hosts master1,master2 ips 192.168.1.100 mode exclusive </role> <role reader> hosts master2,slave1,slave2 ips 192.168.1.101, 192.168.1.102, 192.168.1.103 mode balanced </role>
④配置mmm_agent.conf(monitor主机不需要配置,四台主从主机都需更改其相应主机名)
[root@master1 ~]# cat /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf this master1
[root@master2 ~]# cat /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf this master2
[root@slave1 ~]# cat /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf this slave1
[root@slave2 ~]# cat /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf this slave2
⑤修改mysql-mmm-agent执行脚本(五台主机全部修改)
[root@master1 ~]# cat /etc/init.d/mysql-mmm-agent
#!/bin/sh source /root/.bash_profile # mysql-mmm-agent This shell script takes care of starting and stopping # the mmm agent daemon. # # chkconfig: - 64 36 .......
⑥启动mmm-agent代理进程(四台DB主机启动代理进程)
[root@master1 ~]# chkconfig --add mysql-mmm-agent
[root@master1 ~]# chkconfig mysql-mmm-agent on
[root@master1 ~]# /etc/init.d/mysql-mmm-agent start
Daemon bin: '/usr/sbin/mmm_agentd' Daemon pid: '/var/run/mmm_agentd.pid' Starting MMM Agent daemon... running.
如果在启动时报错,有原因为perl环境的问题,根据提示缺少什么就安装什么
[root@master1 ~]# ss -anpt | grep agentd
LISTEN 0 10 192.168.1.4:9989 *:* users:(("mmm_agentd",pid=26468,fd=3))
监控进程(monitor主机)步骤:
①安装MMM(参考上述步骤)
②配置mmm_common.conf(与DB主机一致)
③配置mmm_mon.conf
[root@localhost ~]# cat /etc/mysql-mmm/mmm_mon.conf
include mmm_common.conf <monitor> ip 127.0.0.1 pid_path /var/run/mmm_mond.pid bin_path /usr/lib/mysql-mmm/ status_path /var/lib/misc/mmm_mond.status ping_ips 192.168.1.7,192.168.1.8,192.168.1.10,192.168.1.12 auto_set_online 0 </monitor> <host default> monitor_user mmm_monitor monitor_password 123.com </host> debug 0
④启动mmm-monitor监控进程
[root@localhost ~]# chkconfig --add mysql-mmm-monitor
[root@localhost ~]# chkconfig mysql-mmm-monitor on
[root@loaclhost ~]# /etc/init.d/mysql-mmm-monitor start
Daemon bin: '/usr/sbin/mmm_mond' Daemon pid: '/var/run/mmm_mond.pid' Starting MMM Monitor daemon: running
[root@localhost ~]# ss -anpt | grep mond
LISTEN 0 10 127.0.0.1:9988 *:* users:(("mmm_mond",pid=27378,fd=9))
四、测试MMM高可用
步骤:
①在初次启动monitor之初,要检查各节状态
[root@localhost ~]# mmm_control show
master1(192.168.1.7) master/AWAITING_RECOVERY. Roles: master2(192.168.1.8) master/AWAITING_RECOVERY. Roles: slave1(192.168.1.10) slave/AWAITING_RECOVERY. Roles: slave2(192.168.1.12) slave/AWAITING_RECOVERY. Roles:
②启动各节点
[root@localhost ~]# mmm_control set_online slave1
OK: State of 'slave1' changed to ONLINE. Now you can wait some time and check its new roles!
[root@localhost ~]# mmm_control set_online master1
OK: State of 'master1' changed to ONLINE. Now you can wait some time and check its new roles!
[root@localhost ~]# mmm_control set_online master2
OK: State of 'master2' changed to ONLINE. Now you can wait some time and check its new roles!
[root@localhost ~]# mmm_control set_online slave2
OK: State of 'slave2' changed to ONLINE. Now you can wait some time and check its new roles!
[root@localhost init.d]# mmm_control show
master1(192.168.1.4) master/ONLINE. Roles: writer(192.168.1.100) master2(192.168.1.8) master/ONLINE. Roles: reader(192.168.1.103) slave1(192.168.1.10) slave/ONLINE. Roles: reader(192.168.1.101) slave2(192.168.1.12) slave/ONLINE. Roles: reader(192.168.1.102)
③查看各节点vip状态
master1:
[root@master1 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:81:20:3c brd ff:ff:ff:ff:ff:ff inet 192.168.1.4/24 brd 192.168.1.255 scope global dynamic eno16777736 valid_lft 78128sec preferred_lft 78128sec inet 192.168.1.100/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe81:203c/64 scope link valid_lft forever preferred_lft forever
master2:
[root@master2 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:f6:7f:57 brd ff:ff:ff:ff:ff:ff inet 192.168.1.8/24 brd 192.168.1.255 scope global dynamic eno16777736 valid_lft 78143sec preferred_lft 78143sec inet 192.168.1.103/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fef6:7f57/64 scope link valid_lft forever preferred_lft forever
slave1:
[root@slave1 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:4b:6a:1e brd ff:ff:ff:ff:ff:ff inet 192.168.1.10/24 brd 192.168.1.255 scope global dynamic eno16777736 valid_lft 78172sec preferred_lft 78172sec inet 192.168.1.101/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fe4b:6a1e/64 scope link valid_lft forever preferred_lft forever
slave2:
[root@slave2 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:db:f7:b8 brd ff:ff:ff:ff:ff:ff inet 192.168.1.12/24 brd 192.168.1.255 scope global dynamic eno16777736 valid_lft 80764sec preferred_lft 80764sec inet 192.168.1.102/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fedb:f7b8/64 scope link valid_lft forever preferred_lft forever
④模拟master1宕机,观察vip状态,以及主从复制状态
master1:
[root@master1 ~]# systemctl stop mysqld
monitor:
[root@localhost ~]# tailf /var/log/mysql-mmm/mmm_mond.log
2018/04/22 15:14:05 WARN Check 'rep_backlog' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.7:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.1.7' (111) 2018/04/22 15:14:16 FATAL State of host 'master1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) 2018/04/22 15:14:16 INFO Removing all roles from host 'master1': 2018/04/22 15:14:16 INFO Removed role 'writer(192.168.1.100)' from host 'master1' 2018/04/22 15:14:16 INFO Orphaned role 'writer(192.168.1.100)' has been assigned to 'master2'
[root@localhost ~]# mmm_control show
master1(192.168.1.7) master/HARD_OFFLINE. Roles: master2(192.168.1.8) master/ONLINE. Roles: reader(192.168.1.101), writer(192.168.1.100) slave1(192.168.1.10) slave/ONLINE. Roles: reader(192.168.1.102) slave2(192.168.1.12) slave/ONLINE. Roles: reader(192.168.1.103)
[root@localhost ~]# mmm_control checks all
master1 ping [last change: 2018/04/22 15:42:02] OK master1 mysql [last change: 2018/04/22 15:47:57] ERROR: Connect error (host = 192.168.1.7:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.1.7' (111)
master2:
[root@master2 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:f6:7f:57 brd ff:ff:ff:ff:ff:ff inet 192.168.1.8/24 brd 192.168.1.255 scope global dynamic eno16777736 valid_lft 77901sec preferred_lft 77901sec inet 192.168.1.103/32 scope global eno16777736 valid_lft forever preferred_lft forever inet 192.168.1.100/32 scope global eno16777736 valid_lft forever preferred_lft forever inet6 fe80::20c:29ff:fef6:7f57/64 scope link valid_lft forever preferred_lft forever
slave1:
mysql> show slave statusG
*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.8 Master_User: myslave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 154 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 320 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes ......
slave2:
mysql> show slave status G
*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.1.8 Master_User: myslave Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 154 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 320 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes .......
⑤重新手动启动master1,观察各状态
master1:
[root@master1 ~]# systemctl restart mysqld
monitor:
[root@localhost ~]# tailf /var/log/mysql-mmm/mmm_mond.log
2018/04/22 15:53:27 INFO Check 'mysql' on 'master1' is ok! 2018/04/22 15:53:28 FATAL State of host 'master1' changed from HARD_OFFLINE to AWAITING_RECOVERY 2018/04/22 15:53:28 INFO Check 'rep_threads' on 'master1' is ok! 2018/04/22 15:53:28 INFO Check 'rep_backlog' on 'master1' is ok
[root@localhost ~]# mmm_control show
master1(192.168.1.7) master/AWAITING_RECOVERY. Roles: master2(192.168.1.8) master/ONLINE. Roles: reader(192.168.1.101), writer(192.168.1.100) slave1(192.168.1.10) slave/ONLINE. Roles: reader(192.168.1.102) slave2(192.168.1.12) slave/ONLINE. Roles: reader(192.168.1.103)
观察到master1从hard_offline变为awaiting状态,重新使master1上线
[root@localhost ~]# mmm_control set_online master1
OK: State of 'master1' changed to ONLINE. Now you can wait some time and check its new roles!
再去观察最新的集群状态,发现master启动后不会重新接管主
[root@localhost ~]# mmm_control show
master1(192.168.1.7) master/ONLINE. Roles: master2(192.168.1.8) master/ONLINE. Roles: reader(192.168.1.101), writer(192.168.1.100) slave1(192.168.1.10) slave/ONLINE. Roles: reader(192.168.1.102) slave2(192.168.1.12) slave/ONLINE. Roles: reader(192.168.1.103)
五、总结
①maser2备主节点宕机不会影响集群性能,就是移除了master2备选节点的读状态②master1主节点宕机,master2备主会接管角色,slave1,slave2会重新指向新的主库进行复制,自动change
③如果master1主库宕机,master2复制应用又落后于master1时就变成了主可写状态,这时数据无法保持一致
④如果master2,slave1,slave2延迟于master1主,master1宕机,slave1,slave2将会等待数据同步master1后,再重新指向master2,这时数据无法保持一致
⑤如果采用MMM高可用架构,主,备节点机器配置一样,而开启半同步进一步提高安全性或采用mariadb/mysql5.7进行多线程从复制,提高复制性能
⑥monitor根据mmm_mon.conf中auto_set_online是否开启,每隔60s检查主机状态,将等待awaiting_recovery设置为online,前提已经从故障状态hard_offline中恢复,monitor监控数据库的三种状态分别为HARD_OFFLINE→AWATING_RECOVERY→ONLINE
⑦对外提供的vip是由monitor程序提供。monitor不启动,vip是不会提供的。如果已经分配好了vip,monitor关闭了原先分配的vip,不会立即关闭外部程序只要不重启网络,这样好处对于monitor可靠性要求低一点,但是如果是服务器宕掉了,vip发生变更,访问的宕机服务器是不会接受访问的