Hugh Brock
2006-Dec-06 22:40 UTC
[Xen-devel] [PATCH] [xendomains] Make absolutely certain xendomains won''t start a domain that has failed to restore
In testing the xendomains init script, we have discovered a condition in which xm restore <vm1> will complete successfully, but the xendomains script nonetheless attempts to create <vm1> from scratch. Any domain with an entry in XENDOMAINS_AUTO that was also automatically paused on shutdown is vulnerable to this problem. We believe the sequence of events is as follows: 1. xm restore guest1 2. xend pauses guest1 and waits N seconds for hotplug to complete 3. hotplug does not complete (for some reason), so xm restore finishes, but guest1 is still in paused state (not destroyed after failure) 4. xm create guest1 is run 5. original paused guest1 grabs the hotplug devices from the new guest1 6. original guest1 is now running 7. new guest1 is waiting for devices which were stolen This results in a running guest1 and a paused guest1; if an operator then attempts to unpause the paused guest1, storage corruption or worse could result. This patch checks the contents of XENDOMAINS_SAVE before the restore process begins, and prevents xendomains from attempting to start any domain that appears there, whether the domain started successfully or not. Signed off by: Hugh Brock <hbrock@redhat.com> diff -ruN xen-3.0.3_0-src-orig/tools/examples/init.d/xendomains xen-3.0.3_0-src-new/tools/examples/init.d/xendomains --- xen-3.0.3_0-src-orig/tools/examples/init.d/xendomains 2006-10-15 08:22:03.000000000 -0400 +++ xen-3.0.3_0-src-new/tools/examples/init.d/xendomains 2006-12-06 15:05:27.000000000 -0500 @@ -204,12 +204,14 @@ return; fi + saved_domains=" " if [ "$XENDOMAINS_RESTORE" = "true" ] && contains_something "$XENDOMAINS_SAVE" then mkdir -p $(dirname "$LOCKFILE") touch $LOCKFILE echo -n "Restoring Xen domains:" + saved_domains=`ls $XENDOMAINS_SAVE` for dom in $XENDOMAINS_SAVE/*; do echo -n " ${dom##*/}" xm restore $dom @@ -234,9 +236,14 @@ # Create all domains with config files in XENDOMAINS_AUTO. # TODO: We should record which domain name belongs # so we have the option to selectively shut down / migrate later + # If a domain statefile from $XENDOMAINS_SAVE matches a domain name + # in $XENDOMAINS_AUTO, do not try to start that domain; if it didn''t + # restore correctly it requires administrative attention. for dom in $XENDOMAINS_AUTO/*; do echo -n " ${dom##*/}" - if is_running $dom; then + shortdom=$(echo $dom | sed -n ''s/^.*\/\(.*\)$/\1/p'') + echo $saved_domains | grep -w $shortdom > /dev/null + if [ $? -eq 0 ] || is_running $dom; then echo -n "(skip)" else xm create --quiet --defconfig $dom _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ewan Mellor
2006-Dec-07 12:23 UTC
Re: [Xen-devel] [PATCH] [xendomains] Make absolutely certain xendomains won''t start a domain that has failed to restore
On Wed, Dec 06, 2006 at 05:40:49PM -0500, Hugh Brock wrote:> In testing the xendomains init script, we have discovered a condition in > which xm restore <vm1> will complete successfully, but the xendomains > script nonetheless attempts to create <vm1> from scratch. Any domain > with an entry in XENDOMAINS_AUTO that was also automatically paused on > shutdown is vulnerable to this problem. We believe the sequence of > events is as follows: > > 1. xm restore guest1 > 2. xend pauses guest1 and waits N seconds for hotplug to complete > 3. hotplug does not complete (for some reason), so xm restore finishes, > but guest1 is still in paused state (not destroyed after failure) > 4. xm create guest1 is run > 5. original paused guest1 grabs the hotplug devices from the new guest1 > 6. original guest1 is now running > 7. new guest1 is waiting for devices which were stolen > > This results in a running guest1 and a paused guest1; if an operator > then attempts to unpause the paused guest1, storage corruption or worse > could result. > > This patch checks the contents of XENDOMAINS_SAVE before the restore > process begins, and prevents xendomains from attempting to start any > domain that appears there, whether the domain started successfully or > not. > > Signed off by: Hugh Brock <hbrock@redhat.com>Applied, thanks Hugh. Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel