Docu review done: Mon 03 Jul 2023 17:08:17 CEST
Table of Content
Commands
Commands | Description |
---|---|
crm ra list systemd | These are common cluster resource agents found in systemd |
crm ra list lsb | LSB (Linux Standard Base) – These are common cluster resource agents found in /etc/init.d directory (init scripts) |
crm ra list ocf | OCF (Open Cluster Framework) – These are actually extended LSB cluster resource agents and usually support additional parameters |
crm resource cleanup <resource> | Cleans up messages from resources |
crm node standby <node1> | puts node into standby |
crm node online <node1> | puts node back to available |
crm configure property maintenance-mode=true | enables the maintenance mode and sets all resources to unmanaged |
crm configure property maintenance-mode=false | disables the maintenance mode and sets all resources to managed |
crm status failcount | shows failcounts on top of status output |
crm resource failcount <resource> set <node> 0 | sets failcount for resources on node to 0 (need to be done for all nodes) |
crm resource meta <resource> set is-managed false | disables the managed function for services managed by crm |
crm resource meta <resource> set is-managed true | enables the managed function for services managed by crm |
crm resource unmanage <resource> | disables the managed resource managed by crm but keeps funktionality |
crm resource manage <resource> | enables the managed resource managed by crm but keeps funktionality |
crm resource refresh | re-checks for resources started outside of crm without chaning anything on crm |
crm resource reprobe | re-checks for resources started outside of crm |
crm resource reprobe | re-checks for resources started outside of crm |
crm_mon --daemonize --as-html /var/www/html/cluster/index.html | Generate crm_mon as an html file (can be consumed by webserver…) and should run on all nodes |
Installation
Sample: https://zeldor.biz/2010/12/activepassive-cluster-with-pacemaker-corosync/
Errors and Solutions
Standby on fail
Node server1: standby (on-fail)
Online: [ server2 ]
Full list of resources:
Resource Group: samba
fs_share (ocf::heartbeat:Filesystem): Started server2
ip_share (ocf::heartbeat:IPaddr2): Started server2
sambad (systemd:smbd): Started server2
syncthingTomcat (systemd:syncthing@tomcat): server2
syncthingShare (systemd:syncthing@share): Started server2
syncthingRoot (systemd:syncthing@root): Started server2
worm_share_syncd (systemd:worm_share_syncd): Started server2
Clone Set: ms_drbd_share [drbd_share] (promotable)
Masters: [ server2 ]
Stopped: [ server1 ]
Failed Resource Actions:
* sambad_monitor_15000 on server1 'unknown error' (1): call=238, status=complete, exitreason='',
last-rc-change='Mon Jul 26 06:28:15 2021', queued=0ms, exec=0ms
* syncthingTomcat_monitor_15000 on server1 'unknown error' (1): call=239, status=complete, exitreason='',
last-rc-change='Mon Jul 26 06:28:15 2021', queued=0ms, exec=0ms
Solution
Maybe even just a refresh of the failed resources would have been enough, but this is not confirmed
crm resource unmanage ms_drbd_share
crm resource refresh ms_drbd_share
crm resource refresh syncthingTomcat
crm resource manage ms_drbd_share