Docu review done: Mon 03 Jul 2023 17:13:04 CEST

Table of Content

Getting stat

CommandDescription
watch -n1 -d 'cat /proc/drbd'shows you the actuale state and connection
drbd-overviewshows you the state and connection with bit less details

Create drbd on lvm

To create the drbd, you first need to setup the disk/partition/lv short summary below with lv:

$ pvcreate /dev/sdx
$ vgcreate drbdvg /dev/sdx
$ lvcreate --name r0lv --size 10G drbdvg

and you need to have ofcourse the package installed ;)

$ apt install drbd-utils

Next is to create the drbd configuration. In our sample we use r0 as resource name.

In here you specify the hosts which are part of the drbd cluster and where the drbd gets stored at.

This config needs to be present on all drbd cluster members, same goes of course for the package drbd-utils and the needed space where to store the drbd

$ cat << EOF >  /etc/drbd.d/r0.res
resource r0 {
  device    /dev/drbd0;
  disk      /dev/drbdvg/r0lv;
  meta-disk internal;

  on server01 {
    address   10.0.0.1:7789;
  }
  on server02 {
    address   10.0.0.2:7789;
  }
}
EOF

Now we are ready to create the resource r0 in drbd and start up the service

$ drbdadm create-md r0
$ systemctl start drbd.service

You can also startup the drbd manually by running the following:

$ drbdadm up r0

Make sure that the members are now conntected to each other, by checking drbd-overview or cat /proc/drbd

$ cat /proc/drbd
version: 8.4.10 (api:1/proto:86-101)
srcversion: 12341234123412341234123
 0: cs:Connected ro:Secondary/Secondayr ds:UpToDate/UpToDate C r-----
    ns:0 nr:100 dw:100 dr:0 al:1 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

If it looks like the above, you are good to go, if not, then you need to figure out, why the connection is not getting established, check tcpdump and so on

Now we set one of the members to primary

$ drbdadm primary --force r0

If you are facing issues with the command bove, use this one:

$ drbdadm -- --overwrite-data-of-peer primary r0

Extend drbd live

To extend a drbd, you first need to extend the underlying lv/pv/partition/md or what ever you use on all drbd cluster members, in our sample we go with lv

#connect to master and extend lvm
$ lvextend -L +[0-9]*G /dev/<drbdvg>/<drbdlv>         # e.g. lvextend -L +24G /dev/drbdvg/r0lv

#connect to slave and do the same ( be carefull it mus have the !!! SAME SIZE !!! )
$ lvextend -L +[0-9]*G /dev/<drbdvg>/<drbdlv>         # e.g. lvextend -L +24G /dev/drbdvg/r0lv

Now you should start to monitor the drbd state, with one of the commands in Getting stat

On the primary server, we perform the resize command. Right afyer you have executed it, you will see that drbd starts to sync from scratch the “data” to other cluster members.

$ drbdadm resize r0

This resync can take a while, depending on your drbd size, network, hardware,…

If you have more then one drbd resoucre, you could use instea of the resoucre name the keyword all, but make sure that you have prepared everything

$ drbdadm resize all

Lets assume the resync finished, now you are ready to extend the filesystem inside the drbd itself, again run this on the primary server

$ xfs_growfs /mnt/drbd_r0_data

Remove DRBD resource/device

Lets assume we want to remove the resource r1

First you need to see which resources you have

$ drbd-overview
NOTE: drbd-overview will be deprecated soon.
Please consider using drbdtop.

 0:r0/0  Connected    Secondary/Primary UpToDate/UpToDate
 1:r1/0  Connected    Secondary/Primary UpToDate/UpToDate

If the system where you are currently connected is set to Secondary you are good already, otherwiese you need to change it first to have that state.

Now you can disconnect it by running drbdadm disconnect r1, drbd-overview or a cat /proc/drbd wil show you tan the state StandAlone

Next step is to detech it like this drbdadm detach r1. If you check again drbd-overview it will look differnt to cat /proc/drbd

$ drbd-overview | grep r1
 1:r1/0  .         .                 .

$ cat /proc/drbd | grep "1:"
 1: cs:Unconfigured

Good so far, as you dont want to keep data on there, you should wipe it

$ drbdadm wipe-md r1

Do you really want to wipe out the DRBD meta data?
[need to type 'yes' to confirm] yes

Wiping meta data...
DRBD meta data block successfully wiped out.

echo "yes" | drbdadm wipe-md r1 is working, if you need it in a script

Now we are nealy done, nex is to remove the minor. The minor wants to have the resource number, which you can see in the drbd-overview 2>&1, just pipe it to the greps grep -E '^ *[0-9]:' | grep -E "[0-9]+"

$ drbdsetup del-minor 1

Now we are good to go and remove the resource fully

$ drbdsetup del-resource r1

Last step, is to remove the resources file beneath /etc/drbd.d/r1.res if you don’t have it automated ;)

Solving issues

one part of drbd is corrupt

assuming r0 is your resoruce name

First we want to diconnect the cluster, run the commands on one of the server, mostly done on the corrupted one

$ drbdadm disconnect r0
$ drbdadm detach r0

If they are not disconnected, restart the drbd service

Now remove the messedup device and start to recreate it

$ drbdadm wipe-md r0
$ drbdadm create-md r0

If you had to stop the drbd service, make sure that it is started again.

Next step is to go to the server which holds the working data and run:

$ drbdadm connect r0

If its not working or they are in the Secondary/Secondary state run (only after they are in sync):

$ drbdadm -- --overwrite-data-of-peer primary r0

Situation Primary/Unknown - Secondary/Unknown

Connect to the slave and run

$ drbdadm -- --discard-my-data connect all

Secondary returns:

r0: Failure: (102) Local address(port) already in use.
Command 'drbdsetup-84 connect r0 ipv4:10.42.13.37:7789 ipv4:10.13.37.42:7789 --max-buffers=40k --discard-my-data' terminated with exit code 10

Then just perform a drbdadm disconnect r0 and run again the command from above

Connect to the master

$ drbdadm connect all

Situation primary/primay

Option 1

Connect to the server which should be secondary

Just make sure that this one really has no needed data onit

$ drbdadm secondary r0

Option2

Connnect to the real master and run to make it the only primary

$ drbdadm -- --overwrite-data-of-peer primary r0

Now you have the state Primary/Unknown and Secondary/Unknown

Connect to the slave and remove the data

$ drbdadm -- --discard-my-data connect all

Situation r0 Unconfigured

drbd shows status on slave:

$ drbd-overview
Please consider using drbdtop.

 0:r0/0  Unconfigured . .

run drbd up to bring the device up again

$ drbdadm up r0

and check out the status

$ drbd-overview
Please consider using drbdtop.

 0:r0/0  SyncTarget Secondary/Primary Inconsistent/UpToDate
    [=================>..] sync'ed: 94.3% (9084/140536)K

situation Connected Secondary/Primary Diskless/UpToDate

$ cat /proc/drbd
version: 8.4.10 (api:1/proto:86-101)
srcversion: 473968AD625BA317874A57E
 0: cs:Connected ro:Secondary/Primary ds:Diskless/UpToDate C r-----
     ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Rrecreate the resource, as seems like it was not fully created and bring the resouce up

$ drbdadm create-md r0
$ drbdadm up r0