Michael Mey
2005-Nov-28 10:50 UTC
[Xen-users] hellp!! live migration fails after several migrations - stress test
Hi folks, I wanted to do a stress test on a domU (Debian 3.1, 2.6.11.12-xenU). running on dom0 (Debian 3.1, 2.6.11.12-xen0), Xen-2.0.7, IA32. Similar problems occured to me running Xen-devel from last week. Since it is under heavy development, I thought testing it with Xen-2.0.7 stable should work for sure. Scenario: 2 of theses hosts (IPs: 192.168.112.17 and 192.168.112.18) are running a script. The script checks if the domain is on the host, if yes, it will be migrated to the other one, if no, the script will sleep for a few minutes. It''s kind of a domU-pingpong. domU has a file-backed VBD lying on a 3rd host (NFS Server). It worked for several migrations, but suddenly, the domU died. Hopefully, someone could help me find out what the problem is. script pingpong: ---------------------------------------------------------------------------------- SCRIPTNAME=`basename $0` HOSTNAME=`hostname` LOGFILE=$SCRIPTNAME.$HOSTNAME.log XENHOST1="192.168.112.17" XENHOST2="192.168.112.18" XENGUEST1="debian1" # determine the xen target host MYIP=`ifconfig eth0 | grep "inet" | awk ''{print $2}'' | cut -d: -f 2` if [ $MYIP = $XENHOST1 ] then TARGETHOST=$XENHOST2 else TARGETHOST=$XENHOST1 fi log() { echo `date ''+[%y/%m/%d %H:%M:%S]''` - $1 >> $LOGFILE } while [ 1 -eq 1 ] do RUNNINGGUESTS=`xm list | grep -v Name | grep -v Domain-0 | awk ''{print $1}''` if `echo $RUNNINGGUESTS | grep $XENGUEST1 > /dev/null` then XENGUESTID=`xm list | grep $XENGUEST1 | awk ''{print $2}''` log "domain $XENGUEST1 (ID=$XENGUESTID) is running here" if `xm list | grep $XENGUEST1 | awk ''{print $5}'' | grep p > /dev/null` then log "domain $XENGUEST1 is currently paused and hence cannot be migrated!" else log "START migration of $XENGUEST1 to $TARGETHOST..." if `xm migrate -l $XENGUEST1 $TARGETHOST` then log "END migration of domain $XENGUEST1 successfull!" else log "ERROR while migrating $XENGUEST1!" fi fi else log "domain $XENGUEST1 is not here. I''ll go back to sleep." fi sleep `expr $RANDOM % 1200 + 120` done ----------------------------------------------------------------------------------------- logfile snippets of the last migration (from 192.168.112.18 to 192.168.112.17: 192.168.112.17: -------------------- xend.log: ------------ [2005-11-28 01:14:30 xend] DEBUG (blkif:203) Connecting blkif to event channel <BlkifBackendInterface 24 0> ports=16:4 [2005-11-28 01:14:30 xend] DEBUG (XendDomain:244) XendDomain>reap> domain died name=debian1 id=24 [2005-11-28 01:14:30 xend] INFO (XendDomain:568) Destroying domain: name=debian1 [2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:665) Destroying vifs for domain 24 [2005-11-28 01:14:30 xend] DEBUG (netif:305) Destroying vif domain=24 vif=0 [2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:674) Destroying vbds for domain 24 [2005-11-28 01:14:30 xend] DEBUG (blkif:552) Destroying blkif domain=24 [2005-11-28 01:14:30 xend] DEBUG (blkif:408) Destroying vbd domain=24 idx=0 [2005-11-28 01:14:30 xend] DEBUG (blkif:408) Destroying vbd domain=24 idx=1 [2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:634) Closing console, domain 24 [2005-11-28 01:14:30 xend] DEBUG (XendDomainInfo:622) Closing channel to domain 24 [2005-11-28 01:14:30 xend] INFO (XendRoot:113) EVENT> xend.virq 4 [2005-11-28 01:14:30 xend] DEBUG (blkif:363) Unbinding vbd (type file) from /dev/loop0 [2005-11-28 01:14:30 xend] DEBUG (blkif:363) Unbinding vbd (type file) from /dev/loop1 [2005-11-28 01:14:30 xend] INFO (XendRoot:113) EVENT> xend.domain.exit [''debian1'', ''24'', ''crash''] [2005-11-28 01:14:30 xend] INFO (XendRoot:113) EVENT> xend.domain.destroy [''debian1'', ''24''] [2005-11-28 01:14:31 xend] INFO (XendRoot:113) EVENT> xend.domain.died [''debian1'', ''24''] --------------------- xfrd.log: ------------ 3488 [INF] XFRD> Forked child pid=5047 3488 [INF] XFRD> Accepted connection from 192.168.112.18:1777 on 2 5069 [INF] XFRD> Xfr service for 192.168.112.18:1777 [DEBUG] Conn_init> flags=1 [DEBUG] Conn_init> write stream... [DEBUG] stream_init>mode=w flags=1 compress=0 [DEBUG] stream_init> unbuffer... [DEBUG] stream_init< err=0 [DEBUG] Conn_init> read stream... [DEBUG] stream_init>mode=r flags=1 compress=0 [DEBUG] stream_init> unbuffer... [DEBUG] stream_init< err=0 [DEBUG] Conn_sxpr> (xfr.hello 1 0)[DEBUG] Conn_sxpr< err=0 [DEBUG] Conn_sxpr> (xfr.xfr 16)[DEBUG] Conn_sxpr< err=0 [1133136756.095291] xc_linux_restore start xc_linux_restore start [1133136756.108828] Created domain 24 Created domain 24 (Domain-0 Domain-24)''domain id=24 name=debian1 memory=256 console=9624 image=/boot/vmlinuz-2.6.11.12-xenU''[1133136756.570952] Reloading memory pages: 0% Reloading memory pages: 6% 337%[1133136870.173691] Received all pages Received all pages 100% 100% [1133136870.183801] Memory reloaded. Memory reloaded. Decreased reservation by 7 pages [1133136870.184758] Domain ready to be built. Domain ready to be built. [1133136870.184844] Domain ready to be unpaused Domain ready to be unpaused [1133136870.185234] DOM=24 DOM=24 [DEBUG] Conn_sxpr> (xfr.err 0)[DEBUG] Conn_sxpr< err=0 [DEBUG] Conn_sxpr> (xfr.err 0)[DEBUG] Conn_sxpr< err=0 5069 [INF] XFRD> Transfer complete in 114 seconds 5069 [INF] XFRD> Xfr service err=0 192.168.112.18: ---------------------------------------------- xend.log: ------------ [2005-11-28 01:14:16 xend] DEBUG (XendDomain:487) domain_restart_schedule> 16 suspend 1 [2005-11-28 01:14:16 xend] INFO (XendRoot:113) EVENT> xend.domain.shutdown [''debian1'', ''16'', ''suspend''] [2005-11-28 01:14:16 xend] DEBUG (XendDomain:244) XendDomain>reap> domain died name=debian1 id=16 [2005-11-28 01:14:16 xend] DEBUG (XendDomain:247) XendDomain>reap> shutdown id=16 reason=suspend [2005-11-28 01:14:16 xend] INFO (XendRoot:113) EVENT> xend.virq 4 [2005-11-28 01:14:16 xend] INFO (XendRoot:113) EVENT> xend.domain.suspended [''debian1'', ''16''] [2005-11-28 01:14:25 xend] INFO (XendDomain:568) Destroying domain: name=debian1 [2005-11-28 01:14:25 xend] DEBUG (XendDomainInfo:665) Destroying vifs for domain 16 [2005-11-28 01:14:25 xend] DEBUG (netif:305) Destroying vif domain=16 vif=0 [2005-11-28 01:14:26 xend] DEBUG (XendDomainInfo:674) Destroying vbds for domain 16 [2005-11-28 01:14:26 xend] DEBUG (blkif:552) Destroying blkif domain=16 [2005-11-28 01:14:26 xend] DEBUG (blkif:408) Destroying vbd domain=16 idx=0 [2005-11-28 01:14:26 xend] DEBUG (blkif:408) Destroying vbd domain=16 idx=1 [2005-11-28 01:14:26 xend] DEBUG (XendDomainInfo:634) Closing console, domain 16 [2005-11-28 01:14:26 xend] DEBUG (XendDomainInfo:622) Closing channel to domain 16 [2005-11-28 01:14:26 xend] INFO (XendRoot:113) EVENT> xend.domain.destroy [''debian1'', ''16''] [2005-11-28 01:14:26 xend] DEBUG (blkif:363) Unbinding vbd (type file) from /dev/loop0 [2005-11-28 01:14:26 xend] DEBUG (blkif:363) Unbinding vbd (type file) from /dev/loop1 [2005-11-28 01:14:27 xend] INFO (XendDomain:568) Destroying domain: name=debian1 [2005-11-28 01:14:27 xend] DEBUG (XendDomainInfo:634) Closing console, domain 16 [2005-11-28 01:14:27 xend] ERROR (XendDomainInfo:627) Domain destroy failed: debian1 Traceback (most recent call last): File "/usr/src/xen-2.0/dist/install/usr/lib/python/xen/xend/XendDomainInfo.py", line 625, in destroy_domain return xc.domain_destroy(dom=self.dom) error: (3, ''No such process'') [2005-11-28 01:14:27 xend] INFO (XendRoot:113) EVENT> xend.domain.destroy [''debian1'', ''16''] [2005-11-28 01:14:27 xend] INFO (XendMigrate:345) Migrate OK: [''migrate'', [''id'', ''16''], [''state'', ''ok''], [''live'', 1], [''resource'', 0], [''src'', [''host'', ''testpc-018''], [''domain'', ''16'']], [''dst'', [''host'', ''192.168.112.17''], [''domain'', 24]]] [2005-11-28 01:14:27 xend] INFO (XendRoot:113) EVENT> xend.domain.died [''debian1'', ''16''] [2005-11-28 01:14:27 xend] INFO (XendRoot:113) EVENT> xend.domain.migrate [''debian1'', ''16'', ''ok'', [''migrate'', [''id'', ''16''], [''state'', ''ok''], [''live'', 1], [''resource'', 0], [''src'', [''host'', ''testpc-018''], [''domain'', ''16'']], [''dst'', [''host'', ''192.168.112.17''], [''domain'', 24]]]] xfrd.log: ------------ 1: sent 37192, skipped 28339, delta 27582ms, dom0 44%, target 4%, sent 44Mb/s, dirtied 59Mb/s 49812 pages [1133136779.371602] Saving memory pages: iter 2 0% 2: sent 38822, skipped 10990, % 96% 2: sent 38822, skipped 10990, delta 18590ms, dom0 33%, target 5%, sent 68Mb/s, dirtied 50Mb/s 28781 pages [1133136797.962478] Saving memory pages: iter 3 0% 3: sent 26495, skipped 2285, 7% 92% 3: sent 26495, skipped 2285, delta 9687ms, dom0 36%, target 8%, sent 89Mb/s, dirtied 62Mb/s 18477 pages [1133136807.650291] Saving memory pages: iter 4 0% 4: sent 16933, skipped 1544, 5% 96% 4: sent 16933, skipped 1544, delta 6007ms, dom0 38%, target 9%, sent 92Mb/s, dirtied 63Mb/s 11599 pages [1133136813.658026] Saving memory pages: iter 5 0% 5: sent 10261, skipped 1338, 8% 99% 5: sent 10261, skipped 1338, delta 3642ms, dom0 39%, target 9%, sent 92Mb/s, dirtied 72Mb/s 8112 pages [1133136817.300454] Saving memory pages: iter 6 0% 6: sent 6907, skipped 1205, 11% 87% 6: sent 6907, skipped 1205, delta 2451ms, dom0 36%, target 17%, sent 92Mb/s, dirtied 82Mb/s 6143 pages [1133136819.752019] Saving memory pages: iter 7 0% 7: sent 5127, skipped 1011, 16% 99% 7: sent 5127, skipped 1011, delta 1849ms, dom0 47%, target 13%, sent 90Mb/s, dirtied 173Mb/s 9795 pages [1133136821.601429] Saving memory pages: iter 8 0% 8: sent 8308, skipped 1487, 10% 98% 8: sent 8308, skipped 1487, delta 4210ms, dom0 35%, target 7%, sent 64Mb/s, dirtied 86Mb/s 11162 pages [1133136825.811665] Saving memory pages: iter 9 0% 9: sent 9356, skipped 1805, 8% 98% 9: sent 9356, skipped 1805, delta 4216ms, dom0 38%, target 18%, sent 72Mb/s, dirtied 98Mb/s 12663 pages [1133136830.028020] Saving memory pages: iter 10 0% 10: sent 11249, skipped 1410, 7% 90% 10: sent 11249, skipped 1410, delta 8518ms, dom0 26%, target 5%, sent 43Mb/s, dirtied 74Mb/s 19253 pages [1133136838.546941] Saving memory pages: iter 11 0% 11: sent 17449, skipped 1803, 5% 99% 11: sent 17449, skipped 1803, delta 10874ms, dom0 30%, target 4%, sent 52Mb/s, dirtied 50Mb/s 16807 pages [1133136849.421847] Saving memory pages: iter 12 0% 12: sent 14596, skipped 2211, 6% 98% 12: sent 14596, skipped 2211, [DEBUG] Conn_sxpr> (debian1 16)[DEBUG] Conn_sxpr< err=0 [1133136856.172376] SUSPEND flags 00020004 shinfo 00000be8 eip c01068fe esi 00035a84 SUSPEND flags 00020004 shinfo 00000be8 eip c01068fe esi 00035a84 delta 6750ms, dom0 39%, target 8%, sent 70Mb/s, dirtied 102Mb/s 21143 pages [1133136856.172581] Saving memory pages: iter 13 0% 13: sent 21143, skipped 0, 9% 97% 13: sent 21143, skipped 0, delta 9709ms, dom0 22%, target 0%, sent 71Mb/s, dirtied 71Mb/s 21143 pages [1133136865.882150] Total pages sent= 223838 (3.42x) Total pages sent= 223838 (3.42x) [1133136865.882212] (of which 0 were fixups) (of which 0 were fixups) [DEBUG] Conn_sxpr> (xfr.err 0)[DEBUG] Conn_sxpr< err=0 [DEBUG] Conn_sxpr> (xfr.xfr.ok 24)[DEBUG] Conn_sxpr< err=0 3555 [INF] XFRD> Transfer complete in 115 seconds 3555 [WRN] XFRD> Transfer OK 3555 [INF] XFRD> Xfr service err=0 Michael -- ---------------------------------------------------------------------------------------- Michael Mey Thinking Objects Software GmbH | mailto: michael.mey@to.com Lilienthalstrasse 2/1 | phone: +49 711 88770-147 70825 Stuttgart-Korntal, Germany | fax: +49 711 88770-449 ---------------------------------------------------------------------------------------- _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users