본문으로 바로가기

Pacemaker, Pacemaker Remote 설치

category Development/Cloud, Openstack 2023. 2. 21. 12:22

728x90

본 글에서는 Pacemaker, Pacemaker Remote + Corosync를 통하여 Cloud 인프라의 상태를 관리하는 방법에 대해서 설명합니다.

Pacemaker 는 노드 세트에서 실행되는 오픈 소스 고가용성 클러스터 자원 관리자 소프트웨어입니다. 노드 간에 정렬된 통신 전달, 클러스터 멤버십, 쿼럼 강제 실행 및 기타 기능을 제공하는 오픈 소스 그룹 통신 시스템인 Corosync와 함께, 구성요소의 장애를 감지하고 애플리케이션의 중단을 최소화하기 위해 필요한 장애 복구 프로시저를 조정하도록 지원합니다.

1. corosync : 저수준의 인프라를 관리해주는 모듈로 "노드 간의 멤버쉽, 쿼럼, 메시징"을 처리합니다. 즉, corosync는 클러스터 내의 노드 간 Discovery, 통신, 동기화 작업 등을 담당합니다.

 

2. pacemaker : corosync의 기능을 이용해 클러스터의 리소스 제어 및 관리를 수행하며, 사용자 입장에서 클러스터의 특정 기능 사용을 위해 pacemaker를 호출합니다.

Pacemaker, Pacemaker Remote 설치

Pacemaker Remote 설치 Com01, Com02

 

# com01, com02
root@com02:~# apt-get install -y pacemaker-remote corosync resource-agents pcs

# com01, com02
root@com02:~# mkdir -p --mode=0750 /etc/pacemaker

# com01, com02
root@com02:~# chgrp haclient /etc/pacemaker

# com01
root@com02:~# dd if=/dev/urandom of=/etc/pacemaker/authkey bs=4096 count=1

1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000300534 s, 13.6 MB/s

# com02
scp com02:/etc/pacemaker/authkey /etc/pacemaker/authkey

# com01, com02
root@com02:~# systemctl enable pacemaker_remote.service
Synchronizing state of pacemaker_remote.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install enable pacemaker_remote

# com01, com02
root@com02:~# systemctl start pacemaker_remote.service

# com01, com02
root@com02:~# systemctl status pacemaker_remote.service 
● pacemaker_remote.service - Pacemaker Remote executor daemon
     Loaded: loaded (/lib/systemd/system/pacemaker_remote.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-02-15 06:50:46 UTC; 5min ago
       Docs: man:pacemaker-remoted
             https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Remote/index.html
   Main PID: 5961 (pacemaker-remot)
      Tasks: 1
     Memory: 2.0M
     CGroup: /system.slice/pacemaker_remote.service
             └─5961 /usr/sbin/pacemaker-remoted

# com01, com02
root@com02:~# netstat -lntup | grep 3121
tcp6       0      0 :::3121                 :::*                    LISTEN      5961/pacemaker-remo

# com02
root@com02:~# ssh -p 3121 com01
ssh_exchange_identification: read: Connection reset by peer
or
kex_exchange_identification: banner line contains invalid characters

 

Pacemaker + Chrosync 설치 con01, con02, con03

# con01, con02, con03
root@con01:~# apt-get install -y pacemaker corosync pcs resource-agents

# con01, con02, con03
root@con01:~# mkdir -p --mode=0750 /etc/pacemaker

# con01, con02, con03
root@con01:~# chgrp haclient /etc/pacemaker

# con01, con02, con03
root@con01:~# scp com01:/etc/pacemaker/authkey /etc/pacemaker/authkey

# con01, con02, con03
root@con01:~# passwd hacluster

# con01, con02, con03
root@con01:~# pcs cluster auth con01 con02 con03
Username: hacluster
Password:
con01: Authorized
con02: Authorized
con03: Authorized

# con01
root@con01:~# pcs cluster setup --force openstackcluster con01 addr=192.168.140.51 con02 addr=192.168.140.52 con03 addr=192.168.140.53
No addresses specified for host 'con01', using 'con01'
No addresses specified for host 'con02', using 'con02'
No addresses specified for host 'con03', using 'con03'
Destroying cluster on hosts: 'con01', 'con02', 'con03'...
con01: Successfully destroyed cluster
con02: Successfully destroyed cluster
con03: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'con01', 'con02', 'con03'
con01: successful removal of the file 'pcsd settings'
con02: successful removal of the file 'pcsd settings'
con03: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'con01', 'con02', 'con03'
con01: successful distribution of the file 'corosync authkey'
con01: successful distribution of the file 'pacemaker authkey'
con02: successful distribution of the file 'corosync authkey'
con02: successful distribution of the file 'pacemaker authkey'
con03: successful distribution of the file 'corosync authkey'
con03: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'con01', 'con02', 'con03'
con01: successful distribution of the file 'corosync.conf'
con02: successful distribution of the file 'corosync.conf'
con03: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

# con01
root@con01:~# pcs cluster enable --all
con01: Cluster Enabled
con02: Cluster Enabled
con03: Cluster Enabled

# con01
root@con01:~# pcs cluster start --all
con01: Starting Cluster...
con02: Starting Cluster...
con03: Starting Cluster...

# pcs cluster destroy

root@con01:~# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 con01 (local)
         
# con01  
root@con01:~# pcs status
Cluster name: openstackcluster

WARNINGS:
No stonith devices and stonith-enabled is not false

Cluster Summary:
  * Stack: corosync
  * Current DC: con01 (version 2.0.3-4b1f869f0f) - partition WITHOUT quorum
  * Last updated: Wed Feb 15 16:42:35 2023
  * Last change:  Wed Feb 15 16:41:33 2023 by hacluster via crmd on con01
  * 3 nodes configured
  * 0 resource instances configured

Node List:
  * Node con02: UNCLEAN (offline)
  * Node con03: UNCLEAN (offline)
  * Online: [ con01 ]

Full List of Resources:
  * No resources

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

# con01, con02, con03
root@con01:~# pcs property set stonith-enabled=false

# con01
root@con01:~# pcs cluster status
Cluster Status:
 Cluster Summary:
   * Stack: corosync
   * Current DC: con01 (version 2.0.3-4b1f869f0f) - partition WITHOUT quorum
   * Last updated: Wed Feb 15 16:48:13 2023
   * Last change:  Wed Feb 15 16:43:17 2023 by root via cibadmin on con01
   * 3 nodes configured
   * 0 resource instances configured
 Node List:
   * Online: [ con01 con02 con03 ]

PCSD Status:
  con01: Online
  con02: Online
  con03: Online
  
# com01, com02
root@com01:~# passwd hacluster
New password:
Retype new password:
passwd: password updated successfully

# com01, com02
root@com01:~# systemctl stop pacemaker_remote.service


# con01
root@con01:~# pcs cluster auth com01 com02
Username: hacluster
Password:
com01: Authorized
com02: Authorized


# con01
root@con01:~# pcs cluster node add-remote com01 192.168.140.54 --force
Sending 'pacemaker authkey' to 'com01'
com01: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'com01'
com01: successful run of 'pacemaker_remote enable'
com01: successful run of 'pacemaker_remote start'

# con01
root@con01:~# pcs cluster node add-remote com02 192.168.140.55 --force
Sending 'pacemaker authkey' to 'com02'
com02: successful distribution of the file 'pacemaker authkey'
Requesting 'pacemaker_remote enable', 'pacemaker_remote start' on 'com02'
com02: successful run of 'pacemaker_remote enable'
com02: successful run of 'pacemaker_remote start'


# con01
root@con01:~# pcs status
Cluster name: openstackcluster
Cluster Summary:
  * Stack: corosync
  * Current DC: con02 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Wed Feb 15 17:49:15 2023
  * Last change:  Wed Feb 15 17:49:10 2023 by root via cibadmin on con01
  * 5 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ con01 con02 con03 ]
  * RemoteOnline: [ com01 com02 ]

Full List of Resources:
  * com01       (ocf::pacemaker:remote):         Started con01
  * com02       (ocf::pacemaker:remote):         Started con02

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
  
  
# con01
root@con01:~# pcs cluster status
Cluster Status:
 Cluster Summary:
   * Stack: corosync
   * Current DC: con02 (version 2.0.3-4b1f869f0f) - partition with quorum
   * Last updated: Wed Feb 15 17:49:22 2023
   * Last change:  Wed Feb 15 17:49:10 2023 by root via cibadmin on con01
   * 5 nodes configured
   * 2 resource instances configured
 Node List:
   * Online: [ con01 con02 con03 ]
   * RemoteOnline: [ com01 com02 ]
   

# com01, com02
root@com01:~# systemctl status pacemaker_remote
● pacemaker_remote.service - Pacemaker Remote executor daemon
     Loaded: loaded (/lib/systemd/system/pacemaker_remote.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-02-15 08:09:34 UTC; 2min 55s ago
       Docs: man:pacemaker-remoted
             https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/2.0/html-single/Pacemaker_Remote/index.html
   Main PID: 8641 (pacemaker-remot)
      Tasks: 1
     Memory: 1.5M
     CGroup: /system.slice/pacemaker_remote.service
             └─8641 /usr/sbin/pacemaker-remoted

 

Pacemaker + Chrosync 의 장애 복구

 

controller 노드에서 crm resource refresh 명령을 수행합니다.

# con01 or con0 or con03
# Docker 환경일 경우
#root@con01:~# docker exec -it hacluster_pacemaker bash

(hacluster-pacemaker)[root@con01 /]# crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: con02 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Wed Mar 29 10:27:22 2023
  * Last change:  Fri Mar 17 08:05:25 2023 by hacluster via crmd on con03
  * 5 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ con01 con02 con03 ]
  * RemoteOnline: [ com01 ]
  * RemoteOFFLINE: [ com02 ]

Full List of Resources:
  * com01       (ocf::pacemaker:remote):         Started con01
  * com02       (ocf::pacemaker:remote):         Stopped

Failed Resource Actions:
  * com02_start_0 on con03 'error' (1): call=9, status='Timed Out', exitreason='', last-rc-change='2023-03-22 11:46:08 +09:00', queued=0ms, exec=0ms
  * com02_start_0 on con01 'error' (1): call=17, status='Timed Out', exitreason='', last-rc-change='2023-03-22 11:46:25 +09:00', queued=0ms, exec=0ms
  * com02_start_0 on con02 'error' (1): call=16, status='Timed Out', exitreason='', last-rc-change='2023-03-22 11:45:48 +09:00', queued=0ms, exec=0ms

(hacluster-pacemaker)[root@con01 /]# crm resource refresh
Waiting for 1 reply from the controller. OK
(hacluster-pacemaker)[root@con01 /]# crm status
Cluster Summary:
  * Stack: corosync
  * Current DC: con02 (version 2.0.3-4b1f869f0f) - partition with quorum
  * Last updated: Wed Mar 29 10:27:37 2023
  * Last change:  Fri Mar 17 08:05:25 2023 by hacluster via crmd on con03
  * 5 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ con01 con02 con03 ]
  * RemoteOnline: [ com01 com02 ]

Full List of Resources:
  * com01       (ocf::pacemaker:remote):         Started con01
  * com02       (ocf::pacemaker:remote):         Started con02

(hacluster-pacemaker)[root@con01 /]#

 

 

참고자료

https://m.blog.naver.com/alice_k106/221786625711

728x90