MySQL高可用架構之MHA
1、關于MHA
MHA(Master HA)是一款開源的MySQL的高可用程序,它為MySQL主從復制架構提供了automating master failover功能。MHA在監控到master節點故障時,會提升其中擁有的最新數據的slave節點成為新的master節點,在此期間,MHA會通過其它從節點獲取額外信息來避免一致性方面的問題。MHA還提供了master節點的在線切換功能,即按需切換master/slave節點。
MHA服務有兩種角色,MHA Manager(管理節點)和MHA Node(數據節點):
MHA Manager:通常單獨部署在一臺獨立機器上管理多個master/slave集群,每個master/slave集群稱為一個application;
MHA node:運行在每臺MySQL服務器上,它通過監控具備解析和清理log功能的腳本來加快故障轉移
2、MHA組件說明
Manager節點:
-masterha_check_ssh:MHA依賴的SSH環境檢測工具;
-masterha_check_repl:MySQL復制環境檢測工具;
-masterha_manager:MHA服務主程序;
-masterha_check_status:MHA運行狀態探測工具;
-masterha_master_monitor:MySQL master節點可用性檢測工具;
-masterha_master_switch:master節點切換工具;
-masterha_conf_host:添加或刪除配置的節點;
-masterha_stop:關閉MHA服務的工具;
Node節點:
-save_binary_logs:保存和復制master的二進制日志;
-apply_diff_relay_logs:識別差異的中繼日志事件并用于其他slave;
-fiter_mysqlbinlog:去除不必要的ROLLBACK事件(MHA已不再使用這個工具);
-purge_relay_logs:清除中繼日志(不會阻塞SQL線程);
自定義擴展:
-secondary_check_script:通過多條網絡路由檢測master的可用性;
-master_ip_failover_script:更新appliction使用的masterip;
-shutdown_script:強制關閉master節點;
-report_script:發送報告;
-init_conf_load_script:加載初始配置參數;
-master_ip_online_change_script:更新master節點ip地址
3、部署及測試
實驗拓撲:
node1:192.168.150.137 MHA manager
node2:192.168.150.138 MHA node mariadb master
node3:192.168.150.139 MHA node mariadb slave candidate
node4:192.168.150.140 MHA node mariadb slave
配置過程:
1、修改每臺服務器的hosts文件
/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.150.137 node1.com node1
192.168.150.138 node2.com node2
192.168.150.139 node3.com node3
192.168.150.140 node4.com node4
2、node2-node4進行mariadb的yum安裝
yum -y install mariadb-server
3、配置ssh互信通信環境
[root@node1 ~]# ssh-keygen -t rsa -P ''
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
a2:f2:10:28:cd:ea:7b:d8:f4:95:15:6e:73:a6:9d:4e root@node1.com
The key's randomart image is:
+--[ RSA 2048]----+
| |
| . |
| . . |
| + = o |
|o + . S * . |
|.. o . + . E |
|. * o . o |
|.. * . . |
| oo . |
+-----------------+
[root@node1 ~]# ls .ssh/id_rsa
id_rsa id_rsa.pub
[root@node1 ~]# cat .ssh/id_rsa.pub > .ssh/authorized_keys
[root@node1 ~]# ssh node1
The authenticity of host 'node1 (192.168.150.137)' can't be established.
ECDSA key fingerprint is 2a:e3:03:52:8c:84:02:59:a2:26:a3:b2:f6:74:6c:3c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node1,192.168.150.137' (ECDSA) to the list of known hosts.
Last login: Wed Mar 29 15:06:52 2017 from 192.168.150.1
[root@node1 ~]# ll .ssh/authorized_keys
-rw-r--r-- 1 root root 396 3月 29 15:35 .ssh/authorized_keys
[root@node1 ~]# chmod go= .ssh/authorized_keys
[root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node2:/root/.ssh
root@node2's password:
id_rsa 100% 1675 1.6KB/s 00:00
authorized_keys 100% 396 0.4KB/s 00:00
[root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node3:/root/.ssh
The authenticity of host 'node3 (192.168.150.139)' can't be established.
ECDSA key fingerprint is 2a:e3:03:52:8c:84:02:59:a2:26:a3:b2:f6:74:6c:3c.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'node3,192.168.150.139' (ECDSA) to the list of known hosts.
root@node3's password:
id_rsa 100% 1675 1.6KB/s 00:00
authorized_keys 100% 396 0.4KB/s 00:00
[root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node4:/root/.ssh
root@node4's password:
Permission denied, please try again.
root@node4's password:
id_rsa 100% 1675 1.6KB/s 00:00
authorized_keys 100% 396 0.4KB/s 00:00
[root@node1 ~]# ssh node2
Last login: Wed Mar 29 15:07:05 2017 from 192.168.150.1
[root@node2 ~]# exit
登出
Connection to node2 closed.
[root@node1 ~]# ssh node3
Last login: Wed Mar 29 15:07:18 2017 from 192.168.150.1
[root@node3 ~]# exit
登出
Connection to node3 closed.
[root@node1 ~]# ssh node3
Last login: Wed Mar 29 15:40:05 2017 from node1.com
[root@node3 ~]# exit
登出
Connection to node3 closed.
[root@node1 ~]# ssh node4
Last failed login: Wed Mar 29 15:39:53 CST 2017 from node1.com on ssh:notty
There was 1 failed login attempt since the last successful login.
Last login: Wed Mar 29 15:39:30 2017 from node1.com
[root@node4 ~]# exit
登出
Connection to node4 closed.
[root@node1 ~]# ssh node4
Last login: Wed Mar 29 15:40:13 2017 from node1.com
[root@node4 ~]# exit
登出
Connection to node4 closed.
4、修改mysql參數
master:
[mysqld]
innodb_file_per_table = 1
skip_name_resolve = 1
log-bin = master-bin
relay-log = relay-bin
server_id = 1
slave:
[mysqld]
innode_file_per_table = 1
skip_name_resolve = 1
log-bin = master-bin
relay-log = relay-bin
server_id = 2
read_only = 1
relay_log_purge = 0
5、主庫開啟并創建授權賬號
master:
MariaDB [(none)]> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000003 | 245 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
MariaDB [(none)]> GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'repluser'@'192.168.%.%' IDENTIFIED BY 'replpass';
MariaDB [(none)]> GRANT ALL ON *.* TO 'mhauser'@'192.168.%.%' IDENTIFIED BY 'mhapass'; #此為mha的管理賬號
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; #此時主庫是可寫可讀
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+
1 row in set (0.00 sec)
6、從庫進行主從功能開啟 node3、node4操作相同
[root@node3 ~]# mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 2
Server version: 5.5.52-MariaDB MariaDB Server
Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.150.138',MASTER_USER='repluser',MSTER_PASSWORD='replpass',MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=245;
Query OK, 0 rows affected (0.01 sec)
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 192.168.150.138
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000003
Read_Master_Log_Pos: 245
Relay_Log_File: relay-bin.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: master-bin.000003
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 245
Relay_Log_Space: 245
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 0
1 row in set (0.00 sec)
MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.150.138
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000003
Read_Master_Log_Pos: 497
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 782
Relay_Master_Log_File: master-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 497
Relay_Log_Space: 1070
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; #此時從庫是只讀模式
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | ON |
+---------------+-------+
1 row in set (0.00 sec)
7、此時一主兩從架構已經配置完成,安裝MHA包
manager節點(node1):
安裝mha4mysql-manager-0.56-0.el6.noarch.rpm和mha4mysql-node-0.56-0.el6.noarch.rpm
yum install mha4mysql* -y
node節點(node2-node4):
安裝mha4mysql-node-0.56-0.el6.noarch.rpm
yum -y iinstall mha4mysql-node-0.56-0.el6.noarch.rpm
8、初始化MHA
創建配置目錄及配置文件(在node1上執行)
[root@node1 ~]# mkdir /etc/masterha
[root@node1 ~]# vim /etc/masterha/app1.cnf
[server default]
user=mhauser
password=mhapass
manager_workdir=/data/masterha/app1
master_log=/data/masterha/app1/manager.log
remote_workdir=/data/masterha/app1
ssh_user=root
repl_user=repluser
repl_password=replpass
ping_interval=1
[server1]
hostname=192.168.150.138
candidate_master=1
[server2]
hostname=192.168.150.139
candidate_master=1
[server3]
hostname=192.168.150.140
9、啟動前檢測
ssh互信配置是否OK
[root@node1 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
......
Warning: Permanently added '192.168.150.139' (ECDSA) to the list of known hosts.
Wed Mar 29 17:03:03 2017 - [debug] ok.
Wed Mar 29 17:03:03 2017 - [info] All SSH connection tests passed successfully.
mysql復制集群的連接配置是否OK
[root@node1 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
......
Wed Mar 29 17:04:40 2017 - [info]
192.168.150.138(192.168.150.138:3306) (current master)
+--192.168.150.139(192.168.150.139:3306)
+--192.168.150.140(192.168.150.140:3306)
Wed Mar 29 17:04:40 2017 - [info] Checking replication health on 192.168.150.139..
Wed Mar 29 17:04:40 2017 - [info] ok.
Wed Mar 29 17:04:40 2017 - [info] Checking replication health on 192.168.150.140..
Wed Mar 29 17:04:40 2017 - [info] ok.
Wed Mar 29 17:04:40 2017 - [warning] master_ip_failover_script is not defined.
Wed Mar 29 17:04:40 2017 - [warning] shutdown_script is not defined.
Wed Mar 29 17:04:40 2017 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
10、啟動MHA
[root@node1 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&
1 &[1] 16989
[root@node1 ~]# tail -f /data/masterha/app1/manager.log
192.168.150.138(192.168.150.138:3306) (current master)
+--192.168.150.139(192.168.150.139:3306)
+--192.168.150.140(192.168.150.140:3306)
Wed Mar 29 21:51:58 2017 - [warning] master_ip_failover_script is not defined.
Wed Mar 29 21:51:58 2017 - [warning] shutdown_script is not defined.
Wed Mar 29 21:51:58 2017 - [info] Set master ping interval 1 seconds.
Wed Mar 29 21:51:58 2017 - [warning] secondary_check_script is not defined. It is highly recommended setti
ng it to check master reachability from two or more routes.Wed Mar 29 21:51:58 2017 - [info] Starting ping health check on 192.168.150.138(192.168.150.138:3306)..
Wed Mar 29 21:51:58 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
11、啟動后查看master節點狀態
[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:16623) is running(0:PING_OK), master:192.168.150.138
12、進行故障轉移測試
(1)maser節點關閉mariadb
[root@node2 ~]# killall mysqld mysqld_safe
(2)此時在manager上可以看到轉移的日志
----- Failover Report -----
app1: MySQL Master failover 192.168.150.138(192.168.150.138:3306) to 192.168.150.139(192.168.150.139:3306)
succeeded
Master 192.168.150.138(192.168.150.138:3306) is down!
Check MHA Manager logs at node1.com:/data/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 192.168.150.139(192.168.150.139:3306) has all relay logs for recovery.
Selected 192.168.150.139(192.168.150.139:3306) as a new master.
192.168.150.139(192.168.150.139:3306): OK: Applying all logs succeeded.
192.168.150.140(192.168.150.140:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.150.140(192.168.150.140:3306): OK: Applying all logs succeeded. Slave started, replicating from 19
2.168.150.139(192.168.150.139:3306)192.168.150.139(192.168.150.139:3306): Resetting slave info succeeded.
Master failover to 192.168.150.139(192.168.150.139:3306) completed successfully.
故障轉移后,manager會自動停止,此時查看master狀態
[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
(3)查看其它兩個庫狀態
node3 已成功接管master,并可讀寫
MariaDB [(none)]> SHOW MASTER STATUS;
+-------------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000003 | 245 | | |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+
1 row in set (0.00 sec)
node4
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.150.139
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000003
Read_Master_Log_Pos: 245
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 530
Relay_Master_Log_File: master-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 245
Relay_Log_Space: 818
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
1 row in set (0.00 sec)
MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | ON |
+---------------+-------+
1 row in set (0.00 sec)
(4)提供新的從節點已修復復制集群
master界定啊故障后,需要重新準備好一個新的MySQL節點?;趤碜杂趍aster節點的備份恢復后,將其重新配置為mster的從節點即可。新加入節點IP為原master節點IP,否則還得修改appl.cnf中相應的設置,最后再次啟動manager,并再次檢查狀態。
[root@node2 ~]# rm -rf /var/lib/mysql/*
[root@node2 ~]# vim /etc/my.cnf
添加從庫兩選項
read_only = 1
relay_log_purge = 0
[root@node2 ~]# systemctl start mariadb.service
[root@node2 ~]# mysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 2
Server version: 5.5.52-MariaDB MariaDB Server
Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'repluser'@'192.168.%.%' IDENTIFIED
BY 'replpass';Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> GRANT ALL ON *.* TO 'mhauser'@'192.168.%.%' IDENTIFIED BY 'mhapass';
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.150.139',MASTER_USER='repluser',MASTER_PASSWORD='r
eplpass',MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=245;
Query OK, 0 rows affected (0.01 sec)
MariaDB [(none)]> SHOW SLAVE\G
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaD
B server version for the right syntax to use near '' at line 1MariaDB [(none)]> START SLAVE;
Query OK, 0 rows affected (0.01 sec)
MariaDB [(none)]> SHOW SLAVE STATUS\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.150.139
Master_User: repluser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000003
Read_Master_Log_Pos: 245
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 530
Relay_Master_Log_File: master-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 245
Relay_Log_Space: 818
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
1 row in set (0.00 sec)
再次開啟manager查看狀態,狀態全部OK,主庫變更為node3
[root@node1 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1 &
[root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:17229) is running(0:PING_OK), master:192.168.150.139
補充:
提供額外機制,防止對master的監控做出誤判、VIP添加、在進行故障轉移時對原有master節點執行STONITH操作避免腦裂,可通過shutdown_scrip實現、必要時,進行在線master節點轉換;
MySQL高可用架構之MHA
1、關于MHA
MHA(Master HA)是一款開源的MySQL的高可用程序,它為MySQL主從復制架構提供了automating master failover功能。MHA在監控到master節點故障時,會提升其中擁有的最新數據的slave節點成為新的master節點,在此期間,MHA會通過其它從節點獲取額外信息來避免一致性方面的問題。MHA還提供了master節點的在線切換功能,即按需切換master/slave節點。
MHA服務有兩種角色,MHA Manager(管理節點)和MHA Node(數據節點):
MHA Manager:通常單獨部署在一臺獨立機器上管理多個master/slave集群,每個master/slave集群稱為一個application; MHA node:運行在每臺MySQL服務器上,它通過監控具備解析和清理log功能的腳本來加快故障轉移
2、MHA組件說明
Manager節點:
-masterha_check_ssh:MHA依賴的SSH環境檢測工具; -masterha_check_repl:MySQL復制環境檢測工具; -masterha_manager:MHA服務主程序; -masterha_check_status:MHA運行狀態探測工具; -masterha_master_monitor:MySQL master節點可用性檢測工具; -masterha_master_switch:master節點切換工具; -masterha_conf_host:添加或刪除配置的節點; -masterha_stop:關閉MHA服務的工具;
Node節點:
-save_binary_logs:保存和復制master的二進制日志; -apply_diff_relay_logs:識別差異的中繼日志事件并用于其他slave; -fiter_mysqlbinlog:去除不必要的ROLLBACK事件(MHA已不再使用這個工具); -purge_relay_logs:清除中繼日志(不會阻塞SQL線程);
自定義擴展:
-secondary_check_script:通過多條網絡路由檢測master的可用性; -master_ip_failover_script:更新appliction使用的masterip; -shutdown_script:強制關閉master節點; -report_script:發送報告; -init_conf_load_script:加載初始配置參數; -master_ip_online_change_script:更新master節點ip地址
3、部署及測試
實驗拓撲:
node1:192.168.150.137 MHA manager node2:192.168.150.138 MHA node mariadb master node3:192.168.150.139 MHA node mariadb slave candidate node4:192.168.150.140 MHA node mariadb slave
配置過程:
1、修改每臺服務器的hosts文件 /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.150.137 node1.com node1 192.168.150.138 node2.com node2 192.168.150.139 node3.com node3 192.168.150.140 node4.com node4 2、node2-node4進行mariadb的yum安裝 yum -y install mariadb-server 3、配置ssh互信通信環境 [root@node1 ~]# ssh-keygen -t rsa -P '' Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: a2:f2:10:28:cd:ea:7b:d8:f4:95:15:6e:73:a6:9d:4e root@node1.com The key's randomart image is: +--[ RSA 2048]----+ | | | . | | . . | | + = o | |o + . S * . | |.. o . + . E | |. * o . o | |.. * . . | | oo . | +-----------------+ [root@node1 ~]# ls .ssh/id_rsa id_rsa id_rsa.pub [root@node1 ~]# cat .ssh/id_rsa.pub > .ssh/authorized_keys [root@node1 ~]# ssh node1 The authenticity of host 'node1 (192.168.150.137)' can't be established. ECDSA key fingerprint is 2a:e3:03:52:8c:84:02:59:a2:26:a3:b2:f6:74:6c:3c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'node1,192.168.150.137' (ECDSA) to the list of known hosts. Last login: Wed Mar 29 15:06:52 2017 from 192.168.150.1 [root@node1 ~]# ll .ssh/authorized_keys -rw-r--r-- 1 root root 396 3月 29 15:35 .ssh/authorized_keys [root@node1 ~]# chmod go= .ssh/authorized_keys [root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node2:/root/.ssh root@node2's password: id_rsa 100% 1675 1.6KB/s 00:00 authorized_keys 100% 396 0.4KB/s 00:00 [root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node3:/root/.ssh The authenticity of host 'node3 (192.168.150.139)' can't be established. ECDSA key fingerprint is 2a:e3:03:52:8c:84:02:59:a2:26:a3:b2:f6:74:6c:3c. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'node3,192.168.150.139' (ECDSA) to the list of known hosts. root@node3's password: id_rsa 100% 1675 1.6KB/s 00:00 authorized_keys 100% 396 0.4KB/s 00:00 [root@node1 ~]# scp -p .ssh/id_rsa .ssh/authorized_keys node4:/root/.ssh root@node4's password: Permission denied, please try again. root@node4's password: id_rsa 100% 1675 1.6KB/s 00:00 authorized_keys 100% 396 0.4KB/s 00:00 [root@node1 ~]# ssh node2 Last login: Wed Mar 29 15:07:05 2017 from 192.168.150.1 [root@node2 ~]# exit 登出 Connection to node2 closed. [root@node1 ~]# ssh node3 Last login: Wed Mar 29 15:07:18 2017 from 192.168.150.1 [root@node3 ~]# exit 登出 Connection to node3 closed. [root@node1 ~]# ssh node3 Last login: Wed Mar 29 15:40:05 2017 from node1.com [root@node3 ~]# exit 登出 Connection to node3 closed. [root@node1 ~]# ssh node4 Last failed login: Wed Mar 29 15:39:53 CST 2017 from node1.com on ssh:notty There was 1 failed login attempt since the last successful login. Last login: Wed Mar 29 15:39:30 2017 from node1.com [root@node4 ~]# exit 登出 Connection to node4 closed. [root@node1 ~]# ssh node4 Last login: Wed Mar 29 15:40:13 2017 from node1.com [root@node4 ~]# exit 登出 Connection to node4 closed. 4、修改mysql參數 master: [mysqld] innodb_file_per_table = 1 skip_name_resolve = 1 log-bin = master-bin relay-log = relay-bin server_id = 1 slave: [mysqld] innode_file_per_table = 1 skip_name_resolve = 1 log-bin = master-bin relay-log = relay-bin server_id = 2 read_only = 1 relay_log_purge = 0 5、主庫開啟并創建授權賬號 master: MariaDB [(none)]> SHOW MASTER STATUS; +-------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-------------------+----------+--------------+------------------+ | master-bin.000003 | 245 | | | +-------------------+----------+--------------+------------------+ 1 row in set (0.00 sec) MariaDB [(none)]> GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'repluser'@'192.168.%.%' IDENTIFIED BY 'replpass'; MariaDB [(none)]> GRANT ALL ON *.* TO 'mhauser'@'192.168.%.%' IDENTIFIED BY 'mhapass'; #此為mha的管理賬號 MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; #此時主庫是可寫可讀 +---------------+-------+ | Variable_name | Value | +---------------+-------+ | read_only | OFF | +---------------+-------+ 1 row in set (0.00 sec) 6、從庫進行主從功能開啟 node3、node4操作相同 [root@node3 ~]# mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 2 Server version: 5.5.52-MariaDB MariaDB Server Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.150.138',MASTER_USER='repluser',MSTER_PASSWORD='replpass',MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=245; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Master_Host: 192.168.150.138 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 245 Relay_Log_File: relay-bin.000001 Relay_Log_Pos: 4 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: No Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 245 Relay_Log_Space: 245 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 0 1 row in set (0.00 sec) MariaDB [(none)]> START SLAVE; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.150.138 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 497 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 782 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 497 Relay_Log_Space: 1070 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 1 1 row in set (0.00 sec) MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; #此時從庫是只讀模式 +---------------+-------+ | Variable_name | Value | +---------------+-------+ | read_only | ON | +---------------+-------+ 1 row in set (0.00 sec) 7、此時一主兩從架構已經配置完成,安裝MHA包 manager節點(node1): 安裝mha4mysql-manager-0.56-0.el6.noarch.rpm和mha4mysql-node-0.56-0.el6.noarch.rpm yum install mha4mysql* -y node節點(node2-node4): 安裝mha4mysql-node-0.56-0.el6.noarch.rpm yum -y iinstall mha4mysql-node-0.56-0.el6.noarch.rpm 8、初始化MHA 創建配置目錄及配置文件(在node1上執行) [root@node1 ~]# mkdir /etc/masterha [root@node1 ~]# vim /etc/masterha/app1.cnf [server default] user=mhauser password=mhapass manager_workdir=/data/masterha/app1 master_log=/data/masterha/app1/manager.log remote_workdir=/data/masterha/app1 ssh_user=root repl_user=repluser repl_password=replpass ping_interval=1 [server1] hostname=192.168.150.138 candidate_master=1 [server2] hostname=192.168.150.139 candidate_master=1 [server3] hostname=192.168.150.140 9、啟動前檢測 ssh互信配置是否OK [root@node1 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf ...... Warning: Permanently added '192.168.150.139' (ECDSA) to the list of known hosts. Wed Mar 29 17:03:03 2017 - [debug] ok. Wed Mar 29 17:03:03 2017 - [info] All SSH connection tests passed successfully. mysql復制集群的連接配置是否OK [root@node1 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf ...... Wed Mar 29 17:04:40 2017 - [info] 192.168.150.138(192.168.150.138:3306) (current master) +--192.168.150.139(192.168.150.139:3306) +--192.168.150.140(192.168.150.140:3306) Wed Mar 29 17:04:40 2017 - [info] Checking replication health on 192.168.150.139.. Wed Mar 29 17:04:40 2017 - [info] ok. Wed Mar 29 17:04:40 2017 - [info] Checking replication health on 192.168.150.140.. Wed Mar 29 17:04:40 2017 - [info] ok. Wed Mar 29 17:04:40 2017 - [warning] master_ip_failover_script is not defined. Wed Mar 29 17:04:40 2017 - [warning] shutdown_script is not defined. Wed Mar 29 17:04:40 2017 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK. 10、啟動MHA [root@node1 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>& 1 &[1] 16989 [root@node1 ~]# tail -f /data/masterha/app1/manager.log 192.168.150.138(192.168.150.138:3306) (current master) +--192.168.150.139(192.168.150.139:3306) +--192.168.150.140(192.168.150.140:3306) Wed Mar 29 21:51:58 2017 - [warning] master_ip_failover_script is not defined. Wed Mar 29 21:51:58 2017 - [warning] shutdown_script is not defined. Wed Mar 29 21:51:58 2017 - [info] Set master ping interval 1 seconds. Wed Mar 29 21:51:58 2017 - [warning] secondary_check_script is not defined. It is highly recommended setti ng it to check master reachability from two or more routes.Wed Mar 29 21:51:58 2017 - [info] Starting ping health check on 192.168.150.138(192.168.150.138:3306).. Wed Mar 29 21:51:58 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond.. 11、啟動后查看master節點狀態 [root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:16623) is running(0:PING_OK), master:192.168.150.138 12、進行故障轉移測試 (1)maser節點關閉mariadb [root@node2 ~]# killall mysqld mysqld_safe (2)此時在manager上可以看到轉移的日志 ----- Failover Report ----- app1: MySQL Master failover 192.168.150.138(192.168.150.138:3306) to 192.168.150.139(192.168.150.139:3306) succeeded Master 192.168.150.138(192.168.150.138:3306) is down! Check MHA Manager logs at node1.com:/data/masterha/app1/manager.log for details. Started automated(non-interactive) failover. The latest slave 192.168.150.139(192.168.150.139:3306) has all relay logs for recovery. Selected 192.168.150.139(192.168.150.139:3306) as a new master. 192.168.150.139(192.168.150.139:3306): OK: Applying all logs succeeded. 192.168.150.140(192.168.150.140:3306): This host has the latest relay log events. Generating relay diff files from the latest slave succeeded. 192.168.150.140(192.168.150.140:3306): OK: Applying all logs succeeded. Slave started, replicating from 19 2.168.150.139(192.168.150.139:3306)192.168.150.139(192.168.150.139:3306): Resetting slave info succeeded. Master failover to 192.168.150.139(192.168.150.139:3306) completed successfully. 故障轉移后,manager會自動停止,此時查看master狀態 [root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING). (3)查看其它兩個庫狀態 node3 已成功接管master,并可讀寫 MariaDB [(none)]> SHOW MASTER STATUS; +-------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-------------------+----------+--------------+------------------+ | master-bin.000003 | 245 | | | +-------------------+----------+--------------+------------------+ 1 row in set (0.00 sec) MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | read_only | OFF | +---------------+-------+ 1 row in set (0.00 sec) node4 MariaDB [(none)]> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.150.139 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 245 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 245 Relay_Log_Space: 818 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 1 row in set (0.00 sec) MariaDB [(none)]> SHOW GLOBAL VARIABLES LIKE 'read_only'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | read_only | ON | +---------------+-------+ 1 row in set (0.00 sec) (4)提供新的從節點已修復復制集群 master界定啊故障后,需要重新準備好一個新的MySQL節點?;趤碜杂趍aster節點的備份恢復后,將其重新配置為mster的從節點即可。新加入節點IP為原master節點IP,否則還得修改appl.cnf中相應的設置,最后再次啟動manager,并再次檢查狀態。 [root@node2 ~]# rm -rf /var/lib/mysql/* [root@node2 ~]# vim /etc/my.cnf 添加從庫兩選項 read_only = 1 relay_log_purge = 0 [root@node2 ~]# systemctl start mariadb.service [root@node2 ~]# mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 2 Server version: 5.5.52-MariaDB MariaDB Server Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'repluser'@'192.168.%.%' IDENTIFIED BY 'replpass';Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> GRANT ALL ON *.* TO 'mhauser'@'192.168.%.%' IDENTIFIED BY 'mhapass'; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> FLUSH PRIVILEGES; Query OK, 0 rows affected (0.00 sec) MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST='192.168.150.139',MASTER_USER='repluser',MASTER_PASSWORD='r eplpass',MASTER_LOG_FILE='master-bin.000003',MASTER_LOG_POS=245; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> SHOW SLAVE\G ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaD B server version for the right syntax to use near '' at line 1MariaDB [(none)]> START SLAVE; Query OK, 0 rows affected (0.01 sec) MariaDB [(none)]> SHOW SLAVE STATUS\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.150.139 Master_User: repluser Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000003 Read_Master_Log_Pos: 245 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000003 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 245 Relay_Log_Space: 818 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 1 row in set (0.00 sec) 再次開啟manager查看狀態,狀態全部OK,主庫變更為node3 [root@node1 ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1 & [root@node1 ~]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:17229) is running(0:PING_OK), master:192.168.150.139
補充:
提供額外機制,防止對master的監控做出誤判、VIP添加、在進行故障轉移時對原有master節點執行STONITH操作避免腦裂,可通過shutdown_scrip實現、必要時,進行在線master節點轉換;
原創文章,作者:N23-蘇州-void,如若轉載,請注明出處:http://www.www58058.com/72216
總結得很好,請保持,加油!