Tom Hibbert
2005-Jun-12 22:13 UTC
[Xen-users] Setup guide: Active/Passive Redundancy using Xen, DRBD and Heartbeat
Hello again Xenophiles, I''ve noticed a few people on the list having problems with Xen and DRBD, so I thought I''d post an approximate walkthrough of the steps I''ve been taking to bring it up. This guide is heavily Sarge oriented, and may or may not be any use to anyone. The main reason for me documenting it is actually so I dont forget again the next time I do it. This guide assumes you already have two Xen Dom0 machines running. You may or may not have a dedicated network interface for Heartbeat/DRBD, it is not required. 1. Build and install the drbd # apt-get install drbd0.7-module-source module-assistant the module-assistant is a very handy tool that works well both with vanilla and with debianised kernel sources. Using it eliminates the requirement to repatch the kernel sources and rebuild. # ARCH=xen module-assistant --kernel-dir=/usr/src/kernels/kernel-source-2.6.10 build drbd0.7-module Obviously replace the --kernel-dir directive with the path to your xen0 kernel. Once module assistant has completed its machinations, install the resultant deb on both machines: # dpkg -i /usr/src/drbd0.7-module-* # update-modules ... and just to be sure it''s worked: # modprobe drbd Note that drbd can only be configured as a module (for reasons unfathomable to me). Finally install the drbd admin utilities: # apt-get install drbd0.7-utils 2. Configure the drbd First, make sure both nodes have entries in hosts file that match the output from hostname. You must be able to resolve the remote node by its hostname. Edit the drbd.conf and add resource stanzas for all block devices you need to replicate. # nano /etc/drbd.conf resource "r1" { protocol C; startup { wfc-timeout 60; degr-wfc-timeout 60; } disk { on-io-error detach; } net { # i have left these in incase i need to use them later # timeout 60; # connect-int 10; # ping-int 10; # max-buffers 2048; # max-epoch-size 2048; } syncer { rate 100M; group 1; # sync concurrently with r0 } on uplink-xen-1 { device /dev/drbd1; disk /dev/md1; address 172.10.10.1:7789; meta-disk internal; } on uplink-xen-2 { device /dev/drbd1; disk /dev/md1; address 172.10.10.2:7789; meta-disk internal; } } Just so we''re clear, the device declaration is the drbd device and the disk declaration is the backend block device that will store the replicated data. "meta-disk internal" means that drbd uses a part of the device near the end to store its metadata, you can use an external device or file here but internal reduces complexity somewhat. NOTE when configuring replication using an existing filesystem, ie one that wont be freshly created after drbd is brought up, you will probably need to run e2resize on it to prevent "attempt to access beyond end of device" errors. Copy the drbd.conf file to both nodes and start drbd. Make sure the referenced disks are not mounted before drbd is started, or Bad Things Will Happen(tm). # /etc/init.d/drbd start drbd will come up on both nodes in "secondary" mode. Make your "primary" node the primary for all drbd devices: # drbdsetup /dev/drbdX primary --do-what-I-say You can check the drbd status with: # cat /proc/drbd You may wish to wait for replication to complete before moving on to the next step. 3. Installing heartbeat # apt-get install heartbeat # nano /etc/heartbeat/ha.cf deadtime 60 warntime 30 initdead 120 bcast eth0 auto_failback off node host1 node host2 logfacility local0 # nano /etc/heartbeat/haresources host1 drbddisk::r2 drbddisk::r2 xendomains::domU I created a simple "xendomains" script for (re)starting xen domains from heartbeat. /etc/ha.d/resource.d/xendomains #!/bin/bash XM="/usr/sbin/xm" CONFPATH="/etc/xen/" ems-fs-dom0 RES="$1" CMD="$2" isrunning=false case "$CMD" in start) $XM create -f $CONFPATH$RES ;; stop) exec $XM destroy $RES ;; status) $XM list | awk ''{print $1}'' | grep $RES > /dev/null if [ $? -eq 0 ] then echo running else echo stopped fi ;; *) echo "Usage: xendomain [filename] {start|stop|status}" exit 1 ;; esac exit 0 There are a few more files that need to be edited # nano /etc/ha.d/authkeys auth 1 1 crc # chmod 600 /etc/ha.d/authkeys The builtin drbddisk resource handler had some problems, so I modified it slightly. # nano /etc/ha.d/resource.d/drbddisk #!/bin/bash # # This script is inteded to be used as resource script by heartbeat # # Jan 2003 by Philipp Reisner. # ### DEFAULTFILE="/etc/default/drbd" DRBDADM="/sbin/drbdadm" if [ -f $DEFAULTFILE ]; then . $DEFAULTFILE fi if [ "$#" -eq 2 ]; then RES="$1" CMD="$2" else RES="all" CMD="$1" fi case "$CMD" in start) # try several times, in case heartbeat deadtime # was smaller than drbd ping time try=6 while true; do $DRBDADM primary $RES && break let "--try" || exit 20 sleep 1 done ;; stop) # exec, so the exit code of drbdadm propagates exec $DRBDADM secondary $RES ;; status) if [ "$RES" = "all" ]; then echo "A resource name is required for status inquiries." exit 10 fi ST=$( $DRBDADM state $RES 2> /dev/null ) ST=${ST%/*} if [ "$ST" = "Primary" ]; then echo "running" else echo "stopped" fi ;; *) echo "Usage: drbddisk [resource] {start|stop|status}" exit 1 ;; esac exit 0 Test the heartbeat resource scripts to ensure they are able to bring up/down both the drbddisk and the xendomain. # /etc/ha.d/resource.d/drbddisk r0 start # /etc/ha.d/resource.d/drbddisk r0 stop # /etc/ha.d/resource.d/xendomains xenu start # /etc/ha.d/resource.d/xendomains/xenu stop Bring up heartbeat on both machines # /etc/init.d/heartbeat/start Check status on the primary node # cat /proc/drbd # xm list _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users