拓補圖:
服務器用了4個網卡
兩個萬兆網卡做了bond連到netgear交換機,交換機端口access 30 對應IP段10.199.16.0/22,網關10.199.16.1做在netgear上
兩個千兆網卡做了bond連到cisco 3750交換機,交換機端口truck 30 40 1001-1300 對應IP段10.199.16.0/22、10.176.4.0/22、kvm虛擬機內網段,網關10.176.0.4.1做在cisco 3750上
netgear和cisco 3750均做了port-channel
服務器配置:
1.ISCSI多路徑配置
defaults {
udev_dir /dev
polling_interval 10
path_selector "round-robin 0"
# path_grouping_policy multibus
path_grouping_policy failover
getuid_callout "/lib/udev/scsi_id –whitelisted –device=/dev/%n"
prio alua
path_checker readsector0
rr_min_io 100
max_fds 8192
rr_weight priorities
failback immediate
no_path_retry fail
user_friendly_names yes
}
multipaths {
multipath {
wwid 36000d31003157200000000000000000a
alias primary1
}
multipath {
wwid 36000d310031572000000000000000003
alias primary2
}
multipath {
wwid 36000d31003157200000000000000000b
alias primary3
}
multipath {
wwid 36000d31003157200000000000000001b
alias qdisk
}
}
2.網卡配置
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
USERCTL=no
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth1
DEVICE=eth1
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
USERCTL=no
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
DEVICE=bond0
TYPE=Bond
ONBOOT=yes
BOOTPROTO=none
BRIDGE=cloudbr0
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth4
DEVICE=eth4
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
MASTER=bond1
SLAVE=yes
USERCTL=no
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth5
DEVICE=eth5
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
MASTER=bond0
SLAVE=yes
USERCTL=no
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond1
DEVICE=bond1
TYPE=Bond
ONBOOT=yes
BOOTPROTO=none
NAME=bond1
BRIDGE=cloudbr1
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-cloudbr0
DEVICE=cloudbr0
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
[root@hmkvm01 ~]# cat /etc/sysconfig/network-scripts/ifcfg-cloudbr1
DEVICE=cloudbr1
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.199.16.101
NETMASK=255.255.252.0
GATEWAY=10.199.16.1
DNS1=114.114.114.114
[root@hmkvm01 ~]# tail -f -n 5 /etc/modprobe.d/dist.conf
alias char-major-89-* i2c-dev
alias bond0 bonding
options bond0 mode=0 miimon=100
alias bond1 bonding
options bond1 mode=0 miimon=100
現象:
1.有一臺服務器出現卡頓現象,從辦公網絡ping kvm虛擬機會有丟包現象,ping網關無丟包
2.RHCS集群新建后node是正常的,再添加別的機器不能Join Cluster,luci面板報紅色錯誤,cman和clvmd不能運行,
而且只要手動啟動cman服務該節點就會進入無限重啟的死循環
3.在luci面板修改Expected votes值不生效,手動修改配置文件設成1,當失敗節點再Join Cluster時依然失敗,Expected votes值又會改變,
指定network模式為UDP Multicast時地址為239開頭的IP,在hmkvm01節點能ping通,在令外的節點ping不通,手動指定Multicast addresses不生效
[root@hmkvm01 ~]# cman_tool status
Version: 6.2.0
Config Version: 28
Cluster Name: hmcloud
Cluster Id: 50417
Cluster Member: Yes
Cluster Generation: 992
Membership state: Cluster-Member
Nodes: 3
Expected votes: 7
Quorum device votes: 3
Total votes: 6
Node votes: 1
Quorum: 4
Active subsystems: 11
Flags:
Ports Bound: 0 11 177 178
Node name: hmkvm01
Node ID: 1
Multicast addresses: 255.255.255.255
Node addresses: 10.199.16.101
4.啟動cman服務 Waiting for quorum… Timed-out waiting for cluster,修改network下的模式為UDP Broadcast或在配置文件加cman broadcast="yes",Post Join Delay 改成600,
手動修改配置文件Expected votes值為1,重啟全部服務器,三臺服務器狀態都正常了,再看配置文件
[root@hmkvm01 ~]# cat /etc/cluster/cluster.conf
<?xml version="1.0"?>
<cluster config_version="28" name="hmcloud">
<clusternodes>
<clusternode name="hmkvm01" nodeid="1">
<fence>
<method name="hmkvm01">
<device name="hmkvm01"/>
</method>
</fence>
</clusternode>
<clusternode name="hmkvm02" nodeid="2">
<fence>
<method name="hmkvm02">
<device name="hmkvm02"/>
</method>
</fence>
</clusternode>
<clusternode name="hmkvm04" nodeid="3">
<fence>
<method name="hmkvm04"/>
</fence>
</clusternode>
<clusternode name="pcs1" nodeid="4"/>
</clusternodes>
<cman broadcast="yes" expected_votes="7"/>
<fence_daemon post_join_delay="600"/>
<fencedevices>
<fencedevice agent="fence_idrac" ipaddr="10.199.2.224" login="root" name="hmkvm01" passwd="HMIDC#88878978"/>
<fencedevice agent="fence_idrac" ipaddr="10.199.2.225" login="root" name="hmkvm02" passwd="HMIDC#88878978"/>
<fencedevice agent="fence_idrac" ipaddr="10.199.2.227" login="root" name="hmkvm04" passwd="HMIDC#88878978"/>
</fencedevices>
<quorumd label="qdisk" min_score="1">
<heuristic interval="10" program="ping -c3 -t2 10.199.16.1" tko="10"/>
</quorumd>
<logging debug="on"/>
</cluster>
5.當集群正常后,在某一節點echo c>/proc/sysrq-trigger,當節點重啟后必須重復4現象才能正常加入集群。
6.仲裁磁盤qdisk能在每臺機器發現,qdisk配置如下
[root@hmkvm01 ~]# mkqdisk -L
mkqdisk v3.0.12.1
/dev/block/253:5:
/dev/disk/by-id/dm-name-qdisk:
/dev/disk/by-id/dm-uuid-mpath-36000d31003157200000000000000001b:
/dev/dm-5:
/dev/mapper/qdisk:
Magic: eb7a62c2
Label: qdisk
Created: Mon Jun 13 16:23:05 2016
Host: hmkvm01
Kernel Sector Size: 512
Recorded Sector Size: 512
6.fence設備是正常的
[root@hmkvm01 ~]# fence_idrac -a 10.199.2.227 -l root -p ****** -o status
Status: ON
7.查看日志沒有發現特別的地方
8.重啟網卡,或中斷幾秒鐘,當前節點就會重啟
問題如下:
1.我的網卡綁定是否有有需要修改的地方?
2.多路徑配置是否有問題?
2.我的集群有沒有配置錯誤?
3.Multicast addresses是各node都能ping通么?
4.network下的紅色方框中的IP地址是什么關系?
5.tcpdump抓包并沒有發現個節點有跟Multicast addresses通信,這是正常的么?
6.現象8重啟的時間在哪里設置?
原創文章,作者:Mrl_Eric,如若轉載,請注明出處:http://www.www58058.com/18356