[原]解决CentOS 7 下corosync 2.3.3 无法组成两个节点集群的问题

[原]使用RDO juno dev1462 部署mongodb 失败的问题

[原]解决 OpenvSwitch terminating with signal 14 (Alarm clock) 错误

May 5

linuxing , 11:21 , 网络服务 » HA , 评论(0) , 引用(0) , 阅读(40136) , Via 本站原创

大 | 中 | 小

采用corosync 构成Pacemaker 集群。但发现启动corosync 服务后，不会自动启动pacemaker 服务。
经确认，在CentOS 7 的corosync 2.3.3 下，pacemaker 默认是disable 的，需要自行激活。

启动corosync 服务后，发现两个节点无法构成集群，没有Nodes：

引用

[root@gz-controller-209100 ~]# crm status
Last updated: Mon May  4 14:43:13 2015
Last change: Mon May  4 14:26:45 2015
Current DC: NONE
0 Nodes configured
0 Resources configured

1.排查
经分析，corosync 服务和pacemaker 服务启动都是正常的。但日志中显示 quorum 没有配置：

引用

[root@gz-controller-209100 corosync]# systemctl status pacemaker
pacemaker.service - Pacemaker High Availability Cluster Manager
   Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; disabled)
   Active: active (running) since 一 2015-05-04 11:59:10 CST; 1s ago
Main PID: 8378 (pacemakerd)
   CGroup: /system.slice/pacemaker.service
           ├─8378 /usr/sbin/pacemakerd -f
           ├─8379 /usr/libexec/pacemaker/cib
           ├─8380 /usr/libexec/pacemaker/stonithd
           ├─8381 /usr/libexec/pacemaker/lrmd
Attempting connection to the cluster...
Last updated: Mon May  4 12:03:39 2015
Last change: Mon May  4 11:59:10 2015
Current DC: NONE
0 Nodes configured
0 Resources configured
           ├─8382 /usr/libexec/pacemaker/attrd
           ├─8383 /usr/libexec/pacemaker/pengine
           └─8384 /usr/libexec/pacemaker/crmd

5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:10 gz-controller-209100.vclound.com cib[8379]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: crm_cluster_connect: Connecting to cluster infrastructure: corosync
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Could not obtain a node name for corosync nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: corosync_node_name: Unable to get node name for nodeid 1084805476
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: notice: get_node_name: Defaulting to uname -n for the local corosync node name
5月 04 11:59:11 gz-controller-209100.vclound.com crmd[8384]: error: cluster_connect_quorum: Corosync quorum is not configured
5月 04 11:59:11 gz-controller-209100.vclound.com stonith-ng[8380]: notice: setup_cib: Watching for stonith topology changes

2.解决
参考 man votequorum 的说明，增加 quorum 配置段。

把节点一的配置文件修改为：

[root@gz-controller-209100 ~]# cat /etc/corosync/corosync.conf
compatibility: whitetank

totem {
  version:                             2
  token:                               3000
  token_retransmits_before_loss_const: 10
  join:                                60
  consensus:                           3600
  vsftype:                             none
  max_messages:                        20
  clear_node_high_bit:                 yes
  rrp_mode:                            none
  secauth:                             on
  threads:                             2
  interface {
    ringnumber:  0
    bindnetaddr: 192.168.209.0 # 此为监听的网段，非固定IP
    mcastaddr:   239.32.12.5
    mcastport:   5405
  }
}

logging {
  fileline:        off
  to_stderr:       yes
  to_logfile:      yes
  to_syslog:       no
  logfile: /var/log/cluster/corosync.log
  syslog_facility: daemon
  debug:           off
  timestamp:       on
  logger_subsys {
    subsys: AMF
    debug:  off
    tags:   enter|leave|trace1|trace2|trace3|trace4|trace6
  }
}

amf {
  mode: disabled
}

aisexec {
  user:  root
  group: root
}
       quorum {
           provider: corosync_votequorum
           expected_votes: 2
           two_node: 1
       }

重启corosync 和 pacemaker 服务：

引用

[root@gz-controller-209100 ~]# systemctl restart corosync
[root@gz-controller-209100 ~]# systemctl restart pacemaker

再次查看集群信息：

引用

[root@gz-controller-209100 ~]# crm status
Last updated: Mon May 4 14:44:47 2015
Last change: Mon May 4 14:43:33 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
1 Nodes configured
0 Resources configured

Online: [ gz-controller-209100.vclound.com ]

节点一已经加入集群。

把配置文件拷贝到第二个节点：

引用

[root@gz-controller-209100 ~]# scp /etc/corosync/corosync.conf 192.168.209.101:/etc/corosync/

重启服务：

引用

[root@gz-controller-209101 ~]# systemctl restart corosync
[root@gz-controller-209101 ~]# systemctl restart pacemaker

集群状态：

引用

[root@gz-controller-209100 ~]# crm status
Last updated: Mon May 4 14:44:55 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured

Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]

两个节点都已经加入集群，问题解决。

3.遗留问题
执行pcs status 的时候有报错

引用

[root@gz-controller-209100 ~]# pcs status
Cluster name:
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Mon May 4 15:09:08 2015
Last change: Mon May 4 14:44:53 2015 via crmd on gz-controller-209100.vclound.com
Stack: corosync
Current DC: gz-controller-209100.vclound.com (1084805476) - partition with quorum
Version: 1.1.10-32.el7_0.1-368c726
2 Nodes configured
0 Resources configured

Online: [ gz-controller-209100.vclound.com gz-controller-209101.vclound.com ]

Full list of resources:

PCSD Status:
Error: no nodes found in corosync.conf

参考：
Why is the message "Error: no nodes found in corosync.conf" in the output of "pcs cluster status" command ?
https://access.redhat.com/solutions/663283
决议
The errors need to be ignored as no corosync.conf file is used.
根源
The error messages seen are not harmful and are expected due to cman stack is being used.
所以，可以忽略该问题。

Tags: corosync

[原]解决CentOS 7 下corosync 2.3.3 无法组成两个节点集群的问题

日历

分类

广告

搜索

统计

全局搜索

最新日志

综合点击排行

随机日志

最新评论

归档

其他

链接

发表评论
表情打开HTML 打开UBB 打开表情隐藏记住我	昵称密码游客无需密码网址电邮 [注册]

< 2024 > < 4 >
日	一	二	三	四	五	六
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30