Sorry for the noise if this isn't the appropriate venue for this. I posted this last month to xen-devel: http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html I can reliably cause a paravirt_ops Xen guest to hang during intensive IO. My current recipe is an untar/tar loop, without compression, of a kernel tree. For example: wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 bzip2 -d linux-2.6.23.tar.bz2 while true; echo `date` tar xf linux-2.6.23.tar tar cf linux-2.6.23.tar linux-2.6.23 done After a few loops, anything that touches the xvd device that hung will get stuck in D state. This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree from 3.1.2. In all cases, the host continues to run fine, nothing out of the ordinary is logged on the dom0 side, xenstore reports the status of the devices is fine. Can anyone reproduce this problem, or let me know what else I can provide to help track this down? Thanks, -Chris
Christopher S. Aker wrote:> Sorry for the noise if this isn't the appropriate venue for this. I > posted this last month to xen-devel: > > http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html > > I can reliably cause a paravirt_ops Xen guest to hang during intensive > IO. My current recipe is an untar/tar loop, without compression, of a > kernel tree. For example: > > wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 > bzip2 -d linux-2.6.23.tar.bz2 > > while true; > echo `date` > tar xf linux-2.6.23.tar > tar cf linux-2.6.23.tar linux-2.6.23 > done > > After a few loops, anything that touches the xvd device that hung will > get stuck in D state. > > This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt > guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and > 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree > from 3.1.2. In all cases, the host continues to run fine, nothing out > of the ordinary is logged on the dom0 side, xenstore reports the > status of the devices is fine. > > Can anyone reproduce this problem, or let me know what else I can > provide to help track this down?Hi, I'll try to track this down asap. Have you tried any other kernel versions? In other words, did it just start happening, or its always done it? Also, could you try 2.6.24-rc6, just to make sure it hasn't already been fixed (which is possible if its something that happened in a higher layer or something). Thanks, J
On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote:> xming wrote: > > But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do > > not boot any more and 2.6.24 does boot but will hang after cpufreq changes > > the frequency. > > > > Interesting. Do you mean dom0 cpufreq frequency changes will cause the > domU to hang? > > J >Yes, when Dom0 changes freq while domU is doing something will trigger this. When using "on demand" will trigger this very eassily. This is from xm top when a domU hangs: test32 ------ 4018 98.8 131072 6.4 131072 6.4 1 1 4516 50087 1 0 433908 300403 3084907223 So it appers to be running (eating CPU) sometimes the state is "r" sometimes "-", but both console and network are dead.
Christopher S. Aker wrote:> Sorry for the noise if this isn't the appropriate venue for this. I > posted this last month to xen-devel: > > http://lists.xensource.com/archives/html/xen-devel/2007-11/msg00777.html > > I can reliably cause a paravirt_ops Xen guest to hang during intensive > IO. My current recipe is an untar/tar loop, without compression, of a > kernel tree. For example: > > wget http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.23.tar.bz2 > bzip2 -d linux-2.6.23.tar.bz2 > > while true; > echo `date` > tar xf linux-2.6.23.tar > tar cf linux-2.6.23.tar linux-2.6.23 > done > > After a few loops, anything that touches the xvd device that hung will > get stuck in D state.I've been running this all night without seeing any problem. I'm using current x86.git#testing with a few local patches, but nothing especially relevent-looking. Could you try the attached patch to see if it makes any difference? J> > This happens on both a 2.6.16 and 2.6.18 dom0 (3.1.2 tools). Paravirt > guests I've tried that exhibit the problem: 2.6.23.8, 2.6.23.12, and > 2.6.24-rc6. It does *not* occur using the Xensource 2.6.18 domU tree > from 3.1.2. In all cases, the host continues to run fine, nothing out > of the ordinary is logged on the dom0 side, xenstore reports the > status of the devices is fine. > > Can anyone reproduce this problem, or let me know what else I can > provide to help track this down? > > Thanks, > -Chris > _______________________________________________ > Virtualization mailing list > Virtualization at lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/virtualization-------------- next part -------------- A non-text attachment was scrubbed... Name: xen-indirect-iret.patch Type: text/x-patch Size: 2429 bytes Desc: not available Url : http://lists.linux-foundation.org/pipermail/virtualization/attachments/20080228/8a84a8a1/attachment.bin
xming wrote:> On Feb 7, 2008 5:09 AM, Jeremy Fitzhardinge <jeremy at goop.org> wrote: > >> xming wrote: >> >>> But I do have one problem after I upgraded to xen 3.2, the 2.6.23.x domU do >>> not boot any more and 2.6.24 does boot but will hang after cpufreq changes >>> the frequency. >>> >>> >> Interesting. Do you mean dom0 cpufreq frequency changes will cause the >> domU to hang? >> >> J >> >> > > Yes, when Dom0 changes freq while domU is doing something will trigger this. > When using "on demand" will trigger this very eassily. > > This is from xm top when a domU hangs: > > test32 ------ 4018 98.8 131072 6.4 131072 > 6.4 1 1 4516 50087 1 0 433908 300403 > 3084907223 > > So it appers to be running (eating CPU) sometimes the state is "r" > sometimes "-", > but both console and network are dead. >Which version of Xen did you try this on? Some versions of xen-unstable had horribly broken cpufreq support, in which it was failing to keep track of cpu speed changes. Current versions should be OK. J