본 글에서는 Pacemaker, Pacemaker Remote + Corosync를 통하여 Cloud 인프라의 상태를 관리하는 방법에 대해서 설명합니다.
Pacemaker 는 노드 세트에서 실행되는 오픈 소스 고가용성 클러스터 자원 관리자 소프트웨어입니다. 노드 간에 정렬된 통신 전달, 클러스터 멤버십, 쿼럼 강제 실행 및 기타 기능을 제공하는 오픈 소스 그룹 통신 시스템인 Corosync와 함께, 구성요소의 장애를 감지하고 애플리케이션의 중단을 최소화하기 위해 필요한 장애 복구 프로시저를 조정하도록 지원합니다.
1. corosync : 저수준의 인프라를 관리해주는 모듈로 "노드 간의 멤버쉽, 쿼럼, 메시징"을 처리합니다. 즉, corosync는 클러스터 내의 노드 간 Discovery, 통신, 동기화 작업 등을 담당합니다.
2. pacemaker : corosync의 기능을 이용해 클러스터의 리소스 제어 및 관리를 수행하며, 사용자 입장에서 클러스터의 특정 기능 사용을 위해 pacemaker를 호출합니다.
Pacemaker, Pacemaker Remote 설치
Pacemaker Remote 설치 Com01, Com02
# com01, com02
root@com02:~# apt-get install -y pacemaker-remote corosync resource-agents pcs
# com01, com02
root@com02:~# mkdir -p --mode=0750 /etc/pacemaker
# com01, com02
root@com02:~# chgrp haclient /etc/pacemaker
# com01
root@com02:~# dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000300534 s, 13.6 MB/s
# com02
scp com02:/etc/pacemaker/authkey /etc/pacemaker/authkey
# com01, com02
root@com02:~# systemctl enable pacemaker_remote.service
Synchronizing state of pacemaker_remote.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable pacemaker_remote
# com01, com02
root@com02:~# systemctl start pacemaker_remote.service
# com01, com02
root@com02:~# systemctl status pacemaker_remote.service
● pacemaker_remote.service - Pacemaker Remote executor daemon
Loaded: loaded (/lib/systemd/system/pacemaker_remote.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2023-02-15 06:50:46 UTC; 5min ago
Docs: man:pacemaker-remoted
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Remote/index.html
Main PID: 5961 (pacemaker-remot)
Tasks: 1
Memory: 2.0M
CGroup: /system.slice/pacemaker_remote.service
└─5961 /usr/sbin/pacemaker-remoted
# com01, com02
root@com02:~# netstat -lntup | grep 3121
tcp6 0 0 :::3121 :::* LISTEN 5961/pacemaker-remo
# com02
root@com02:~# ssh -p 3121 com01
ssh_exchange_identification: read: Connection reset by peer
or
kex_exchange_identification: banner line contains invalid characters
Pacemaker + Chrosync 설치 con01, con02, con03
# con01, con02, con03
root@con01:~# apt-get install -y pacemaker corosync pcs resource-agents
# con01, con02, con03
root@con01:~# mkdir -p --mode=0750 /etc/pacemaker
# con01, con02, con03
root@con01:~# chgrp haclient /etc/pacemaker
# con01, con02, con03
root@con01:~# scp com01:/etc/pacemaker/authkey /etc/pacemaker/authkey
# con01, con02, con03
root@con01:~# passwd hacluster
# con01, con02, con03
root@con01:~# pcs cluster auth con01 con02 con03
Username: hacluster
Password:
con01: Authorized
con02: Authorized
con03: Authorized
# con01
root@con01:~# pcs cluster setup --force openstackcluster con01 addr=192.168.140.51 con02 addr=192.168.140.52 con03 addr=192.168.140.53
No addresses specified for host 'con01', using 'con01'
No addresses specified for host 'con02', using 'con02'
No addresses specified for host 'con03', using 'con03'
Destroying cluster on hosts: 'con01', 'con02', 'con03'...
con01: Successfully destroyed cluster
con02: Successfully destroyed cluster
con03: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'con01', 'con02', 'con03'
con01: successful removal of the file 'pcsd settings'
con02: successful removal of the file 'pcsd settings'
con03: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'con01', 'con02', 'con03'
con01: successful distribution of the file 'corosync authkey'
con01: successful distribution of the file 'pacemaker authkey'
con02: successful distribution of the file 'corosync authkey'
con02: successful distribution of the file 'pacemaker authkey'
con03: successful distribution of the file 'corosync authkey'
con03: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'con01', 'con02', 'con03'
con01: successful distribution of the file 'corosync.conf'
con02: successful distribution of the file 'corosync.conf'
con03: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
# con01
root@con01:~# pcs cluster enable --all
con01: Cluster Enabled
con02: Cluster Enabled
con03: Cluster Enabled
# con01
root@con01:~# pcs cluster start --all
con01: Starting Cluster...
con02: Starting Cluster...
con03: Starting Cluster...
# pcs cluster destroy
root@con01:~# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 con01 (local)
# con01
root@con01:~# pcs status
Cluster name: openstackcluster
WARNINGS:
No stonith devices and stonith-enabled is not false
Cluster Summary:
* Stack: corosync
* Current DC: con01 (version 2.0.3-4b1f869f0f) - partition WITHOUT quorum
* Last updated: Wed Feb 15 16:42:35 2023
* Last change: Wed Feb 15 16:41:33 2023 by hacluster via crmd on con01
* 3 nodes configured
* 0 resource instances configured
Node List:
* Node con02: UNCLEAN (offline)
* Node con03: UNCLEAN (offline)
* Online: [ con01 ]
Full List of Resources:
* No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
# con01, con02, con03
root@con01:~# pcs property set stonith-enabled=false
# con01
root@con01:~# pcs cluster status
Cluster Status:
Cluster Summary:
* Stack: corosync
* Current DC: con01 (version 2.0.3-4b1f869f0f) - partition WITHOUT quorum
* Last updated: Wed Feb 15 16:48:13 2023
* Last change: Wed Feb 15 16:43:17 2023 by root via cibadmin on con01
* 3 nodes configured
* 0 resource instances configured
Node List:
* Online: [ con01 con02 con03 ]
PCSD Status:
con01: Online
con02: Online
con03: Online
# com01, com02
root@com01:~# passwd hacluster
New password:
Retype new password:
passwd: password updated successfully
# com01, com02
root@com01:~# systemctl stop pacemaker_remote.service
# con01
root@con01:~# pcs cluster auth com01 com02
Username: hacluster
Password:
com01: Authorized
com02: Authorized
# con01
root@con01:~# pcs cluster node add-remote com01 192.168.140.54 --force
Sending 'pacemaker authkey' to 'com01'
com01: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'com01'
com01: successful run of 'pacemaker_remote enable'
com01: successful run of 'pacemaker_remote start'
# con01
root@con01:~# pcs cluster node add-remote com02 192.168.140.55 --force
Sending 'pacemaker authkey' to 'com02'
com02: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'com02'
com02: successful run of 'pacemaker_remote enable'
com02: successful run of 'pacemaker_remote start'
# con01
root@con01:~# pcs status
Cluster name: openstackcluster
Cluster Summary:
* Stack: corosync
* Current DC: con02 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Wed Feb 15 17:49:15 2023
* Last change: Wed Feb 15 17:49:10 2023 by root via cibadmin on con01
* 5 nodes configured
* 2 resource instances configured
Node List:
* Online: [ con01 con02 con03 ]
* RemoteOnline: [ com01 com02 ]
Full List of Resources:
* com01 (ocf::pacemaker:remote): Started con01
* com02 (ocf::pacemaker:remote): Started con02
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
# con01
root@con01:~# pcs cluster status
Cluster Status:
Cluster Summary:
* Stack: corosync
* Current DC: con02 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Wed Feb 15 17:49:22 2023
* Last change: Wed Feb 15 17:49:10 2023 by root via cibadmin on con01
* 5 nodes configured
* 2 resource instances configured
Node List:
* Online: [ con01 con02 con03 ]
* RemoteOnline: [ com01 com02 ]
# com01, com02
root@com01:~# systemctl status pacemaker_remote
● pacemaker_remote.service - Pacemaker Remote executor daemon
Loaded: loaded (/lib/systemd/system/pacemaker_remote.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2023-02-15 08:09:34 UTC; 2min 55s ago
Docs: man:pacemaker-remoted
https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Remote/index.html
Main PID: 8641 (pacemaker-remot)
Tasks: 1
Memory: 1.5M
CGroup: /system.slice/pacemaker_remote.service
└─8641 /usr/sbin/pacemaker-remoted
Pacemaker + Chrosync 의 장애 복구
controller 노드에서 crm resource refresh 명령을 수행합니다.
# con01 or con0 or con03
# Docker 환경일 경우
#root@con01:~# docker exec -it hacluster_pacemaker bash
(hacluster-pacemaker)[root@con01 /]# crm status
Cluster Summary:
* Stack: corosync
* Current DC: con02 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Wed Mar 29 10:27:22 2023
* Last change: Fri Mar 17 08:05:25 2023 by hacluster via crmd on con03
* 5 nodes configured
* 2 resource instances configured
Node List:
* Online: [ con01 con02 con03 ]
* RemoteOnline: [ com01 ]
* RemoteOFFLINE: [ com02 ]
Full List of Resources:
* com01 (ocf::pacemaker:remote): Started con01
* com02 (ocf::pacemaker:remote): Stopped
Failed Resource Actions:
* com02_start_0 on con03 'error' (1): call=9, status='Timed Out', exitreason='', last-rc-change='2023-03-22 11:46:08 +09:00', queued=0ms, exec=0ms
* com02_start_0 on con01 'error' (1): call=17, status='Timed Out', exitreason='', last-rc-change='2023-03-22 11:46:25 +09:00', queued=0ms, exec=0ms
* com02_start_0 on con02 'error' (1): call=16, status='Timed Out', exitreason='', last-rc-change='2023-03-22 11:45:48 +09:00', queued=0ms, exec=0ms
(hacluster-pacemaker)[root@con01 /]# crm resource refresh
Waiting for 1 reply from the controller. OK
(hacluster-pacemaker)[root@con01 /]# crm status
Cluster Summary:
* Stack: corosync
* Current DC: con02 (version 2.0.3-4b1f869f0f) - partition with quorum
* Last updated: Wed Mar 29 10:27:37 2023
* Last change: Fri Mar 17 08:05:25 2023 by hacluster via crmd on con03
* 5 nodes configured
* 2 resource instances configured
Node List:
* Online: [ con01 con02 con03 ]
* RemoteOnline: [ com01 com02 ]
Full List of Resources:
* com01 (ocf::pacemaker:remote): Started con01
* com02 (ocf::pacemaker:remote): Started con02
(hacluster-pacemaker)[root@con01 /]#
참고자료
728x90