This sounds like a B rate horror movie. Has anyone else seen Zombie VMs? I had a number of VMs running and used the following script to destroy them: for vm in `xm list | awk ''{print $1}'' | grep -v Name | grep -v Domain-0`; do xm destroy $vm; done This destroyed all of the para-virtualized domains running (4 of them) but turned all the HVM VMs into Zombies as shown here: [root@vm0 ~]# xm list Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 5074 4 r----- 4912.8 Zombie-dsl0 25 256 1 -b---d 552.1 Zombie-dsl1 26 256 1 -b---d 552.2 Zombie-dsl2 27 256 1 -b---d 550.0 Zombie-dsl3 28 256 1 -b---d 554.5 Zombie-knoppix0 17 256 1 -b---d 4459.9 Zombie-knoppix1 18 256 1 -----d 4425.9 Zombie-knoppix2 19 256 1 -b---d 4530.9 Zombie-knoppix3 20 256 1 -b---d 4493.7 Subsequent attempts to destroy the VMs using "xm destroy 25" or "xm destroy Zombie-dsl0" don''t do anything. It''s curious that the VMs are shown as booting and being destroyed (- b----d). The para-virtualized VMs were named centos[0-3] so it might be a timing issue where only 4 destroys were properly handled and the para- virtualized VMs happened to be the first 4 domains in xm list. I will play around a bit to see if I can recreate consistently and if there options to really destroy the domains. This is not an issue for me since my VM environment is a lab, but in production this might be very problematic. Mike. _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, 2006-12-01 at 12:57 -0500, Michael Froh wrote:> This sounds like a B rate horror movie. Has anyone else seen Zombie > VMs?Yes. I get them quite often due to the fact that I use mostly Tyan boards with on board SODIMMs for disk caching. Guests like to hang on shutdown due to that. Only seems to be on Tyan boards.> > I had a number of VMs running and used the following script to > destroy them: > > for vm in `xm list | awk ''{print $1}'' | grep -v Name | grep -v > Domain-0`; do xm destroy $vm; done >I hope those aren''t ext3 file systems. ''shutdown'' would be preferable.> This destroyed all of the para-virtualized domains running (4 of > them) but turned all the HVM VMs into Zombies as shown here: >Destroyed is the word. You may want to fsck prior to booting them again, it would be faster.> [root@vm0 ~]# xm list > Name ID Mem(MiB) VCPUs State > Time(s) > Domain-0 0 5074 4 r----- > 4912.8 > Zombie-dsl0 25 256 1 -b---d > 552.1 > Zombie-dsl1 26 256 1 -b---d > 552.2 > Zombie-dsl2 27 256 1 -b---d > 550.0 > Zombie-dsl3 28 256 1 -b---d > 554.5 > Zombie-knoppix0 17 256 1 -b---d > 4459.9 > Zombie-knoppix1 18 256 1 -----d > 4425.9 > Zombie-knoppix2 19 256 1 -b---d > 4530.9 > Zombie-knoppix3 20 256 1 -b---d > 4493.7 > > > Subsequent attempts to destroy the VMs using "xm destroy 25" or "xm > destroy Zombie-dsl0" don''t do anything. >Zombie VM''s are just like zombie processes.. they''re waiting for something to happen before they exit. In this case they''re waiting for disks to sync on a VBD that''s no longer connected. In effect, you pulled out the hard drives before the VM''s could sync what they had in the inode cache to write, then yanked the power cord and plugged it back in really quickly. Bad idea.> It''s curious that the VMs are shown as booting and being destroyed (- > b----d). >Whats being destroyed are your file systems.> The para-virtualized VMs were named centos[0-3] so it might be a > timing issue where only 4 destroys were properly handled and the para- > virtualized VMs happened to be the first 4 domains in xm list. > > I will play around a bit to see if I can recreate consistently and if > there options to really destroy the domains. This is not an issue > for me since my VM environment is a lab, but in production this might > be very problematic. >Amen. Try "xm shutdown" .. if your script has to ensure a dom-u exited try something like : counter=0 while [ `xm list | grep [domname]` = 0 ] && [ "$counter" -le 20 ]; do xm shutdown [domname] sleep 5 let "counter += 1" done if [ "$counter" -ge 20 ]; then xm pause [domname] xm sysrq [domname] S sleep 5 xm destroy [domname] fi Depending on the I/O usage of the guests, you may want to toss in a xm sysrq 0 S too. Note, "xm shutdown [domname]" is almost always going to exit 0. The only reason it will not is if [domname] doesn''t exist. It is a little tricky to use in a script. The above is completely off the top of my head and meant for illustration only. ext3 (or any other journaling file systems) get *very* grumpy if they can''t flush their inodes prior to shutting down. Save yourself a few hassles :) xm destroy = pull out the power cord. You may try using "xendomains" instead.> Mike. >Hope this helps -Tim> _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 1-Dec-06, at 1:25 PM, Tim Post wrote:> On Fri, 2006-12-01 at 12:57 -0500, Michael Froh wrote: >> This sounds like a B rate horror movie. Has anyone else seen Zombie >> VMs? > > Yes. I get them quite often due to the fact that I use mostly Tyan > boards with on board SODIMMs for disk caching. Guests like to hang on > shutdown due to that. Only seems to be on Tyan boards.I am using a Dell PowerEdge 2900. Don''t know the motherboard used by Dell.> >> >> I had a number of VMs running and used the following script to >> destroy them: >> >> for vm in `xm list | awk ''{print $1}'' | grep -v Name | grep -v >> Domain-0`; do xm destroy $vm; done >> > > I hope those aren''t ext3 file systems. ''shutdown'' would be preferable.Understood. Right now I''m just playing with Xen so there is no data to be lost. The domains which were correctly destroyed were centos domainU with ext3 fs mounted. These are snapshots of a pristine ext3 fs so will just recreate the snapshots in my xen sandbox. As noted in the list below, the remaining were dsl & knoppix domainU which only had their respective .iso images mounted ro so inode flushing should not be a problem here.> >> This destroyed all of the para-virtualized domains running (4 of >> them) but turned all the HVM VMs into Zombies as shown here: >> > > Destroyed is the word. You may want to fsck prior to booting them > again, > it would be faster. > >> [root@vm0 ~]# xm list >> Name ID Mem(MiB) VCPUs State >> Time(s) >> Domain-0 0 5074 4 r----- >> 4912.8 >> Zombie-dsl0 25 256 1 -b---d >> 552.1 >> Zombie-dsl1 26 256 1 -b---d >> 552.2 >> Zombie-dsl2 27 256 1 -b---d >> 550.0 >> Zombie-dsl3 28 256 1 -b---d >> 554.5 >> Zombie-knoppix0 17 256 1 -b---d >> 4459.9 >> Zombie-knoppix1 18 256 1 -----d >> 4425.9 >> Zombie-knoppix2 19 256 1 -b---d >> 4530.9 >> Zombie-knoppix3 20 256 1 -b---d >> 4493.7 >> >> >> Subsequent attempts to destroy the VMs using "xm destroy 25" or "xm >> destroy Zombie-dsl0" don''t do anything. >> > > Zombie VM''s are just like zombie processes.. they''re waiting for > something to happen before they exit. In this case they''re waiting for > disks to sync on a VBD that''s no longer connected. In effect, you > pulled > out the hard drives before the VM''s could sync what they had in the > inode cache to write, then yanked the power cord and plugged it > back in > really quickly. > > Bad idea. > >> It''s curious that the VMs are shown as booting and being destroyed (- >> b----d). >> > > Whats being destroyed are your file systems. > >> The para-virtualized VMs were named centos[0-3] so it might be a >> timing issue where only 4 destroys were properly handled and the >> para- >> virtualized VMs happened to be the first 4 domains in xm list. >> >> I will play around a bit to see if I can recreate consistently and if >> there options to really destroy the domains. This is not an issue >> for me since my VM environment is a lab, but in production this might >> be very problematic. >> > > Amen. Try "xm shutdown" .. if your script has to ensure a dom-u exited > try something like : > > counter=0 > > while [ `xm list | grep [domname]` = 0 ] && [ "$counter" -le 20 ]; do > xm shutdown [domname] > sleep 5 > let "counter += 1" > done > > if [ "$counter" -ge 20 ]; then > xm pause [domname] > xm sysrq [domname] S > sleep 5 > xm destroy [domname] > fi > > Depending on the I/O usage of the guests, you may want to toss in a xm > sysrq 0 S too. > > Note, "xm shutdown [domname]" is almost always going to exit 0. The > only > reason it will not is if [domname] doesn''t exist. It is a little > tricky > to use in a script. > > The above is completely off the top of my head and meant for > illustration only. > > ext3 (or any other journaling file systems) get *very* grumpy if they > can''t flush their inodes prior to shutting down. Save yourself a few > hassles :) > > xm destroy = pull out the power cord. > > You may try using "xendomains" instead.Thanks for the draft script. I haven''t gotten around to playing with xendomains yet, but will. I have since tried to recreate the problem but have been unable to after a system reboot. I have tried 24 running domains and all "destroy" properly, so it seems it wasn''t a timing issue. Tim, when you do get zombie domains, how do you eventually purge them since they seem to be using memory but no CPU.> >> Mike. >> > > Hope this helps > -Tim > >> _______________________________________________ >> Xen-users mailing list >> Xen-users@lists.xensource.com >> http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, 2006-12-01 at 14:44 -0500, Michael Froh wrote:> On 1-Dec-06, at 1:25 PM, Tim Post wrote: >> > Understood. Right now I''m just playing with Xen so there is no data > to be lost. ><phew>> The domains which were correctly destroyed were centos domainU with > ext3 fs mounted. These are snapshots of a pristine ext3 fs so will just > recreate the snapshots in my xen sandbox. > > As noted in the list below, the remaining were dsl & knoppix domainU > which > only had their respective .iso images mounted ro so inode flushing > should > not be a problem here. > >Its hard to tell what was causing the to zombie.. if its just the HVM guests that were hanging it tells me their BIOS had some unfinished business. XM destroy does just what it says, destroys a guest in memory and (tries to) instantly free any associated bios block devices and memory. If you do a xm shutdown [domname] and keep typing xm list while waiting for it to happen, you''ll notice most full or paravirt guests zombie for just a second.> >> This destroyed all of the para-virtualized domains running (4 of > >> them) but turned all the HVM VMs into Zombies as shown here: > >> > >Most likely the virtuailzed BIOS, but no idea why exactly.> Thanks for the draft script. I haven''t gotten around to playing with > xendomains > yet, but will. > > I have since tried to recreate the problem but have been unable to > after a > system reboot. I have tried 24 running domains and all "destroy" > properly, > so it seems it wasn''t a timing issue.You may want to share this with xen-devel. While you are doing the opposite of whats appropriate, they may be interested to see your experience. I''m trying to think of everything in a guest that could have '' unfinished business '' that looped forever and the only thing coming to mind is the BIOS. Does this happen with just one or two hvm guests as opposed to 10+? .. dom-0 expects a certain amount to return to heap. I''m not sure if shadow paging specified for a fully virt VM''s BIOS is treated any differently than conventional paging for a guest when it comes to xm destroy either, a better question for xen-devel What I do know is dom-0 did not get back what it gave the guest prior to destroying it. Were you using rocks or some other kind of socket helper? How many nics? I''m trying to duplicate what you did without much luck, but I don''t have any Intel platform to do it with. Its not duplicating on my X2.> Tim, when you do get zombie domains, how do you eventually purge them > since they seem to be using memory but no CPU. > >Typically they vanish within a second or two. Again, "destroy" is not at all the best way to stop a guest, usually its used only if a guest doesn''t respond to an orderly shutdown. xm shutdown = shutdown -f now xm destroy = yank out the power cord... actually its more like open the case, yank out the HD, then the ram, then the BIOS (if hvm) then the PSU (and by extension the power cord) while still running. Not sure if I got them in the right order but I think you get the gist :) Not what you''d want to do in production at all, especially with no substantial pause in between.> >> Mike.Best, -Tim _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Fri, Dec 01, 2006 at 12:57:03PM -0500, Michael Froh wrote:> This sounds like a B rate horror movie. Has anyone else seen Zombie > VMs?Yes!> I had a number of VMs running and used the following script to > destroy them: > > for vm in `xm list | awk ''{print $1}'' | grep -v Name | grep -v > Domain-0`; do xm destroy $vm; done > > This destroyed all of the para-virtualized domains running (4 of > them) but turned all the HVM VMs into Zombies as shown here:You need to kill the qemu-dm process in dom0 as well and then these domains will disappear. -- Nick Craig-Wood <nick@craig-wood.com> -- http://www.craig-wood.com/nick _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On 2 Dec 2006 at 2:25, Tim Post wrote:> if [ "$counter" -ge 20 ]; then > xm pause [domname] > xm sysrq [domname] S > sleep 5 > xm destroy [domname] > fi >Someone had pointed out earlier that after a "pause" a "sysrq" woun''t actually do much... Ulrich _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
> -----Original Message----- > From: xen-users-bounces@lists.xensource.com > [mailto:xen-users-bounces@lists.xensource.com] On Behalf Of Tim Post > Sent: 01 December 2006 20:48 > To: Michael Froh > Cc: xen-users@lists.xensource.com > Subject: Re: [Xen-users] Zombie VMs cannot be destroyed > > On Fri, 2006-12-01 at 14:44 -0500, Michael Froh wrote: > > On 1-Dec-06, at 1:25 PM, Tim Post wrote: > > > > > > > Understood. Right now I''m just playing with Xen so there > is no data > > to be lost. > > > > <phew> > > > The domains which were correctly destroyed were centos domainU with > > ext3 fs mounted. These are snapshots of a pristine ext3 fs > so will just > > recreate the snapshots in my xen sandbox. > > > > As noted in the list below, the remaining were dsl & > knoppix domainU > > which > > only had their respective .iso images mounted ro so inode flushing > > should > > not be a problem here. > > > > > Its hard to tell what was causing the to zombie.. if its just the HVM > guests that were hanging it tells me their BIOS had some unfinished > business.HVM domains will not react (correctly) to "xm shutdown" as far as I know - at least this was the case a while back. I presume it''s possible to simulate that the power-button was pressed through ACPI handling, but I have no idea if this is implemented or not. If it''s not, then "xm shutdown" will not shut down the domain (correctly), as the xm commands can''t really talk to the guest-OS itself. In PV domains, there''s a signal from Xen to the guest-domain to say "go shut yourself down, pretty please". [snip] -- Mats _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users