May 5

[原]解决CentOS 7 下corosync 2.3.3 无法组成两个节点集群的问题 雷阵雨

linuxing , 11:21 , 网络服务 » HA , 评论(0) , 引用(0) , 阅读(40136) , Via 本站原创 | |
采用corosync 构成Pacemaker 集群。但发现启动corosync 服务后,不会自动启动pacemaker 服务。
经确认,在CentOS 7 的corosync 2.3.3 下,pacemaker 默认是disable 的,需要自行激活。

启动corosync 服务后,发现两个节点无法构成集群,没有Nodes:
引用
[root@gz-controller-209100 ~]# crm status      
Last updated: Mon May  4 14:43:13 2015
Last change: Mon May  4 14:26:45 2015
Current DC: NONE
0 Nodes configured
0 Resources configured

1.排查
经分析,corosync 服务和pacemaker 服务启动都是正常的。但日志中显示 quorum 没有配置:
引用
[root@gz-controller-209100 corosync]# systemctl status pacemaker
pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled)
   Active: active (running) since 一 2015-05-04 11:59:10 CST; 1s ago
Main PID: 8378 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─8378 /usr/sbin/pacemakerd -f
           ├─8379 /usr/libexec/pacemaker/cib
           ├─8380 /usr/libexec/pacemaker/stonithd
           ├─8381 /usr/libexec/pacemaker/lrmd
Attempting connection to the cluster...
Last updated: Mon May  4 12:03:39 2015
Last change: Mon May  4 11:59:10 2015
Current DC: NONE
0 Nodes configured
0 Resources configured
           ├─8382 /usr/libexec/pacemaker/attrd
           ├─8383 /usr/libexec/pacemaker/pengine
           └─8384 /usr/libexec/pacemaker/crmd

5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: error: cluster_connect_quorum: Corosync quorum is not configured
5月 04 11:59:11 gz-controller-209100.vclound.com stonith-ng[8380]: notice: setup_cib: Watching for stonith topology changes

2.解决
参考 man votequorum 的说明,增加 quorum 配置段。

把节点一的配置文件修改为:


重启corosync 和 pacemaker 服务:
引用
[root@gz-controller-209100 ~]# systemctl restart corosync
[root@gz-controller-209100 ~]# systemctl restart pacemaker

再次查看集群信息:
引用
[root@gz-controller-209100 ~]# crm status
Last updated: Mon May  4 14:44:47 2015
Last change: Mon May  4 14:43:33 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
1 Nodes configured
0 Resources configured


Online: [ gz-controller-209100.vclound.com ]

节点一已经加入集群。

把配置文件拷贝到第二个节点:
引用
[root@gz-controller-209100 ~]# scp /etc/corosync/corosync.conf  192.168.209.101:/etc/corosync/

重启服务:
引用
[root@gz-controller-209101 ~]# systemctl restart corosync
[root@gz-controller-209101 ~]# systemctl restart pacemaker

集群状态:
引用
[root@gz-controller-209100 ~]# crm status
Last updated: Mon May  4 14:44:55 2015
Last change: Mon May  4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured


Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]

两个节点都已经加入集群,问题解决。

3.遗留问题
执行pcs status 的时候有报错
引用
[root@gz-controller-209100 ~]# pcs status
Cluster name:
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Mon May  4 15:09:08 2015
Last change: Mon May  4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured


Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]

Full list of resources:


PCSD Status:
Error: no nodes found in corosync.conf


参考:
Why is the message "Error: no nodes found in corosync.conf" in the output of "pcs cluster status" command ?
https://access.redhat.com/solutions/663283
决议
The errors need to be ignored as no corosync.conf file is used.
根源
The error messages seen are not harmful and are expected due to cman stack is being used.
所以,可以忽略该问题。
Tags:
发表评论
表情
emotemotemotemotemot
emotemotemotemotemot
emotemotemotemotemot
emotemotemotemotemot
emotemotemotemotemot
打开HTML
打开UBB
打开表情
隐藏
记住我
昵称   密码   游客无需密码
网址   电邮   [注册]