一、MHA簡介
二、試驗環境及要求
三、部署MHA
四、測試MHA集群功能
一、MHA簡介
MHA(Master High Availability)目前在MySQL高可用方面是一個相對成熟的解決方案。在MySQL故障切換過程中,MHA能做到在0~30秒之內手動或自動(結合腳本)完成數據庫的故障切換操作,并且在進行故障切換的過程中,MHA能在最大程度上保證數據的一致性,以達到真正意義上的高可用性。
該軟件由兩部分組成:MHA Manager(管理節點)和MHA Node(數據節點)。MHA Manager可以單獨部署在一臺獨立的機器上管理多個master-slave集群,也可以部署在一臺slave節點上。MHA Node運行在每臺MySQL服務器上,MHA Manager會定時探測集群中的master節點,當master出現故障時,它可以自動將最新數據的slave提升為新的master,然后將所有其他的slave重新指向新的master。整個故障轉移過程對應用程序完全透明。
二、試驗環境及要求
1、linux系統版本:cento7.2
[root@jev71 ~]#cat /etc/centos-release CentOS Linux release 7.2.1511 (Core)
2、mysql版本:5.5
[root@jev71 ~]#mysql Welcome to the MariaDB monitor. Commands end with ; or g. Your MariaDB connection id is 12 Server version: 5.5.44-MariaDB-log MariaDB Server Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others.
3、keeplived版本:1.2
[root@jev71 ~]#keepalived -v Keepalived v1.2.13 (11/20,2015)
4、mha4myqsl版本:0.56
[root@jev74 ~]#ll -rw-r--r--. 1 root root 87119 Nov 9 2015 mha4mysql-manager-0.56-0.el6.noarch.rpm -rw-r--r--. 1 root root 36326 Nov 9 2015 mha4mysql-node-0.56-0.el6.noarch.rpm
5、試驗清單:
三、部署MHA
1、確保各節點之間時間同步
[root@jev6 ~]#ansible jev7[1-4].com -m shell -a "date"
2、各個節點之間需通過主機名可互相通信
這里以編輯manager節點為例
[root@jev74 ~]#cat /etc/hosts 127.0.0.1 jev74.com jev74 172.16.250.131 jev71.com jev71 # master 172.16.250.132 jev72.com jev72 # slave 172.16.250.133 jev73.com jev73 # slave 172.16.250.134 jev74.com jev74 # manager
3、各個節點之間需通過主機名可直接SSH登錄
[root@jev74 ~]#ssh-keygen -t rsa -P '' [root@jev74 ~]#cat .ssh/id_rsa.pub >>.ssh/authorized_keys [root@jev74 ~]#chmod go= .ssh/authorized_keys [root@jev74 ~]#for i in {1..3} ;do scp -p .ssh/id_rsa .ssh/authorized_keys root@jev7$i:/root/.ssh/;done The authenticity of host 'jev71 (172.16.250.131)' can't be established. ECDSA key fingerprint is e3:11:54:04:58:2d:37:2a:86:50:0b:cf:c3:e9:30:e6. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'jev71,172.16.250.131' (ECDSA) to the list of known hosts. root@jev71's password: 。 。 。
4、配置mysql集群,確保主從復制正常
#mastet節點配置: [root@jev71 ~]#vim /etc/my.cnf.d/repl.cnf [mysqld] server-id=1 log-bin=master-log relay-log=relay-log #slave節點配置: [root@jev72 ~]#vim /etc/my.cnf.d/repl.cnf [mysqld] server-id=2 log-bin=master-log relay-log=relay-log relay_log_purge=0 read_only=1 skip_name_resolve=1 innodb_file_per_table=1 [root@jev73 ~]#vim /etc/my.cnf.d/repl.cnf [mysqld] server-id=3 log-bin=master-log relay-log=relay-log relay_log_purge=0 read_only=1 skip_name_resolve=1 innodb_file_per_table=1 #啟動mysql: [root@jev71 ~]# systemctl start mariadb && systemctl status mariadb [root@jev72 ~]# systemctl start mariadb && systemctl status mariadb [root@jev73 ~]# systemctl start mariadb && systemctl status mariadb #主節點授權repluser及mhaadmin用戶: [root@jev71 ~]#mysql #記住maser日志狀態 MariaDB [(none)]> show master status ; +-------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-------------------+----------+--------------+------------------+ | master-log.000002 | 245 | | | +-------------------+----------+--------------+------------------+ #授權主從復制用戶-repluser MariaDB [(none)]> grant replication slave,replication client on *.* to 'repluser'@'172.16.250.%' identified by 'replpass' ; #授權MHA管理用戶-mhaadmin MariaDB [(none)]> grant all on *.* to 'mhaadmin'@'172.16.250.%' identified by 'mhapass' ; 從節點啟動復制:master_log_file,master_log_pos 為剛剛在master查看的maser日志狀態; [root@jev72 ~]#mysql #配置主從復制起點 MariaDB [(none)]> change master to master_host='172.16.250.131',master_user='repluser',master_password='replpass',master_log_file='master-log.000002',master_log_pos=245; #查看主從復制情況 MariaDB [(none)]> show grants for 'repluser'@'172.16.250.%'; +----------------------------------------------------------------------------------------------------------------------------------------------------+ | Grants for repluser@172.16.250.% | +----------------------------------------------------------------------------------------------------------------------------------------------------+ | GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repluser'@'172.16.250.%' IDENTIFIED BY PASSWORD '*D98280F03D0F78162EBDBB9C883FC01395DEA2BF' | +----------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) [root@jev73 ~]#mysql MariaDB [(none)]> change master to master_host='172.16.250.131',master_user='repluser',master_password='replpass',master_log_file='master-log.000002',master_log_pos=245; MariaDB [(none)]> show grants for 'repluser'@'172.16.250.%'; +----------------------------------------------------------------------------------------------------------------------------------------------------+ | Grants for repluser@172.16.250.% | +----------------------------------------------------------------------------------------------------------------------------------------------------+ | GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repluser'@'172.16.250.%' IDENTIFIED BY PASSWORD '*D98280F03D0F78162EBDBB9C883FC01395DEA2BF' | +----------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec)
5、配置keepalived高可用VIP
在master及備用master節點安裝keepalived
[root@jev6 ~]#ansible jev7[1:2].com -m yum -a "name=keepalived state=latest"
keepliaved配置文件,下面為keepalived主節點配置,從節點參考備注信息,配置時注意去掉#后面的注釋
[root@jev71 ~]#cat /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { notification_email { acassen@firewall.loc } notification_email_from Alexandre.Cassen@firewall.loc smtp_server 192.168.200.1 smtp_connect_timeout 30 router_id mysql vrrp_mcast_group4 224.0.88.88 #組播地址 } vrrp_script chk_mysqld { script "killall -0 mysqld && exit 0 || exit 1" interval 1 weight -5 fall 2 } vrrp_instance VI_1 { state BACKUP interface eno16777736 virtual_router_id 8 priority 100 #權重,jev72上的值要略低于100,但要高于100-weight,本例應為96-99 advert_int 1 nopreempt #不搶占模式,從節點上不必配置此項 authentication { auth_type PASS auth_pass mysqlvipass } track_script { chk_mysqld } virtual_ipaddress { 172.16.250.188/16 dev eno16777736 #高可用的VIP地址 } }
啟動keepalived集群,注意啟動順序,先啟動主節點,在啟動從節點
[root@jev71 ~]#systemctl restart keepalived && systemctl status keepalived [root@jev72 ~]#systemctl restart keepalived && systemctl status keepalived
VIP默認綁定在主節點上(jev1),VIP漂移是否正常,可以通過tcpdump抓包跟ip命令查看(具體驗證方式比較簡單,這里就不多說)
[root@jev74 ~]#tcpdump -i eno16777736 -nn host 224.0.88.88 21:31:11.536386 IP 172.16.250.131 > 224.0.88.88: VRRPv2, Advertisement, vrid 8, prio 100, authtype simple, intvl 1s, length 20 [root@jev71 ~]#ip a 1 eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:a1:c8:d1 brd ff:ff:ff:ff:ff:ff inet 172.16.250.131/16 brd 172.16.255.255 scope global eno16777736 valid_lft forever preferred_lft forever inet 172.16.250.188/16 scope global secondary eno16777736 valid_lft forever preferred_lft forever
6、安裝配置MHA
MHA項目地址:https://github.com/yoshinorim/mha4mysql-manager
在主節點上mha4mysql-manager及其mha4mysql-node兩管理軟件
[root@jev74 ~]#yum install mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm -y
在從節點上mha4mysql-node軟件
[root@jev74 ~]#for i in {1..3} ;do scp -p mha4mysql-node-0.56-0.el6.noarch.rpm root@jev7$i:/root/;done [root@jev71 ~]#yum install mha4mysql-node-0.56-0.el6.noarch.rpm -y [root@jev72 ~]#yum install mha4mysql-node-0.56-0.el6.noarch.rpm -y [root@jev73 ~]#yum install mha4mysql-node-0.56-0.el6.noarch.rpm -y
創建配置文件
[root@jev74 ~]#mkdir /etc/masterha/ [root@jev74 ~]#vim !$app1.cnf vim /etc/masterha/app1.cnf [server default] user=mhaadmin password=mhapass manager_workdir=/data/masterha/app1 manager_log=/data/masterha/app1/manager.log remote_workdir=/data/masterha/app1 ssh_user=root repl_user=repluser repl_password=replpass ping_interval=1 [server1] hostname=172.16.250.131 candidate_master=1 [server2] hostname=172.16.250.132 candidate_master=1 [server3] hostname=172.16.250.133 [root@jev74 ~]#mkdir -pv /data/masterha/app1
四、測試MHA集群功能
1、檢查主機間SSH通訊及健康狀態
檢查主機之間ssh通訊狀態,狀態必須為All SSH connection tests passed successfully.才能進行后面操作;
[root@jev74 ~]#masterha_check_ssh --conf=/etc/masterha/app1.cnf 。 。 。 Tue Feb 21 20:05:29 2017 - [info] All SSH connection tests passed successfully.
檢查集群就看狀態,狀態必須為MySQL Replication Health is OK.才能進行后面操作;
[root@jev74 ~]#masterha_check_repl --conf=/etc/masterha/app1.cnf 172.16.250.131(172.16.250.131:3306) (current master) +--172.16.250.132(172.16.250.132:3306) +--172.16.250.133(172.16.250.133:3306) 。 。 。 MySQL Replication Health is OK.
2、啟動MHA及驗證master是否能正常切換
啟動masterha_manager
#以進程方式運行masterha_manager
[root@jev74 ~]#nohup masterha_manager --conf=/etc/masterha/app1.cnf >/data/masterha/app1/manager.log 2>&1 & #查看masterha_manager進程是否在后臺正常運行 [root@jev74 ~]#ps aux |grep masterha_manager root 12264 0.1 2.1 298672 21512 pts/2 S 21:13 0:00 perl /usr/bin/masterha_manager -conf=/etc/mas terha/app1.cnfroot 12532 0.0 0.0 112644 956 pts/2 S+ 21:17 0:00 grep --color=auto masterha_manager #查看masterha_manager運行狀態(app1集群) [root@jev74 ~]#masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:12264) is running(0:PING_OK), master:172.16.250.131
制造故障,關閉master(jev1)的mysql進程
[root@jev71 ~]#systemctl stop mariadb && systemctl status mariadb
查看masterha_manager的日志信息,可以發現jev2成為新的master節點
[root@jev74 ~]#tail -15 /data/masterha/app1/manager.log app1: MySQL Master failover 172.16.250.131(172.16.250.131:3306) to 172.16.250.132(172.16.250.132:3306) succeeded Master 172.16.250.131(172.16.250.131:3306) is down! Check MHA Manager logs at jev74.com:/data/masterha/app1/manager.log for details. Started automated(non-interactive) failover. The latest slave 172.16.250.132(172.16.250.132:3306) has all relay logs for recovery. Selected 172.16.250.132(172.16.250.132:3306) as a new master. 172.16.250.132(172.16.250.132:3306): OK: Applying all logs succeeded. 172.16.250.133(172.16.250.133:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 172.16.250.133(172.16.250.133:3306): OK: Applying all logs succeeded. Slave started, replicating from 172.16.250.132(172.16.250.132:3306)172.16.250.132(172.16.250.132:3306): Resetting slave info succeeded. Master failover to 172.16.250.132(172.16.250.132:3306) completed successfully.
查看VIP是否轉移到了jev2(新master)上面
[root@jev72 ~]#ip a 1 eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:0c:29:a1:c8:d2 brd ff:ff:ff:ff:ff:ff inet 172.16.250.132/16 brd 172.16.255.255 scope global eno16777736 valid_lft forever preferred_lft forever inet 172.16.250.188/16 scope global secondary eno16777736 valid_lft forever preferred_lft forever
故障轉移后masterha會自動關閉,有利于在生產環境排除問題后再手動啟動;
[root@jev74 ~]#masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
3、注意事項
本實驗將VIP設定為非搶占模式,就是防止在故障master修復期間,故障master節點將新master的VIP搶奪回去,導致整個集群無法正常工作;
masterha默認不允許短時間內頻繁切換master節點,故切換之后需要刪除手工刪除/data/masterha/app1/app1.failover.complete,才能進行第二次測試
故障master修復后(或啟動新節點),需要基于新的master節點的備份恢復數據,將該節點配置為slave即可,需將IP配置為之前故障master的ip或在app1.cnf中定義;再次啟動manager進程集群健康狀態即可;
如需將修復的節點,重新啟動為master節點,需手動切換master同時將,將目前已有vip節點的keepalived進程殺死,讓VIP漂移回去。(也可以通過修改定義master_ip_failover腳本實現VIP漂移)
[root@jev74 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=172.16.250.132 --new_master_host =172.16.250.132
基于MHA的mysql高可用集群搭建介紹到此;關于master修復詳細見【MHA–master在線修復】;
原創文章,作者:Jev Tse,如若轉載,請注明出處:http://www.www58058.com/70072