Hi all, I saw reproducible hangs in dom0 when the system is under heavy load. Testbed settings: four dom0s share a nfs server for domU images. a total number of 24 domUs (6 domUs on each dom0). When the system under heavy load, busy processing e-commerce requests, one or two of the dom0s hanged. no input can be accepted and reboot is necessary. Anyone had the same experience? The causes I can come up are following: 1. nfs is not configured properly. But before I upgraded to xen 4, xen 3 worked pretty well. 2. the domU''s are using tap2 disk. Any similar problem in testing tap2? 3. Or the problem is from the new pvops kernel ? All the domU are cpu intensive and not generating a lot of IOs. Unfortunately, dom0''s dmesg and xm log recorded nothing about the hangs. FYI: Xen: 4.0.1-rc3-pre dom0: centos 2.6.32.1 pvops 8G, 8 cores domU: 2.6.18.8 PV kernel 1G, 4 VCPU NFS server: 8G, 8 cores, 4-disk RAID 5 nfs version 3 over TCP, rw size 4K Interconnect: Gigabyte Ethernet. Thanks a lot ! _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Pasi Kärkkäinen
2010-Sep-24 10:27 UTC
[Xen-users] Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre
On Fri, Sep 24, 2010 at 12:22:21AM -0400, Jia Rao wrote:> Hi all, > I saw reproducible hangs in dom0 when the system is under heavy load. > Testbed settings: > four dom0s share a nfs server for domU images. a total number of 24 domUs > (6 domUs on each dom0). When the system under heavy load, busy processing > e-commerce requests, one or two of the dom0s hanged. no input can be > accepted and reboot is necessary. > Anyone had the same experience? The causes I can come up are following: > 1. nfs is not configured properly. But before I upgraded to xen 4, xen 3 > worked pretty well. > 2. the domU''s are using tap2 disk. Any similar problem in testing tap2? > 3. Or the problem is from the new pvops kernel ? All the domU are cpu > intensive and not generating a lot of IOs. > Unfortunately, dom0''s dmesg and xm log recorded nothing about the hangs. > FYI: > Xen: 4.0.1-rc3-pre > dom0: centos 2.6.32.1 pvops 8G, 8 cores > domU: 2.6.18.8 PV kernel 1G, 4 VCPU > NFS server: 8G, 8 cores, 4-disk RAID 5 nfs version 3 over TCP, rw size 4K > Interconnect: Gigabyte Ethernet. > Thanks a lot !Well first of all test with Xen 4.0.1 final, does that help? If not, try Xen 4.0.2-rc1-pre, which has some additional IRQ related fixes. -- Pasi _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jeremy Fitzhardinge
2010-Sep-24 19:08 UTC
[Xen-users] Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre
On 09/23/2010 09:22 PM, Jia Rao wrote:> Hi all, > > I saw reproducible hangs in dom0 when the system is under heavy load. > > Testbed settings: > four dom0s share a nfs server for domU images. a total number of 24 > domUs (6 domUs on each dom0). When the system under heavy load, busy > processing e-commerce requests, one or two of the dom0s hanged. no > input can be accepted and reboot is necessary.Is the whole machine locked solid, or does it still, for example, respond to ping on its external interfaces, capslock works on the keyboard (if any), console echos characters? Does Xen still respond on the console (^A ^A ^A if you have a serial console).> > Anyone had the same experience? The causes I can come up are following: > > 1. nfs is not configured properly. But before I upgraded to xen 4, xen > 3 worked pretty well. > > 2. the domU''s are using tap2 disk. Any similar problem in testing tap2? > > 3. Or the problem is from the new pvops kernel ? All the domU are cpu > intensive and not generating a lot of IOs. > > Unfortunately, dom0''s dmesg and xm log recorded nothing about the hangs. > > FYI: > > Xen: 4.0.1-rc3-pre > dom0: centos 2.6.32.1 pvops 8G, 8 coresTry disabling irqbalanced, which can cause lost events. J _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi Jeremy, The whole machine was locked. No response to ping, local VGA display. I did not try the serial console and will let you know once I try it. BTW. How to disable irqbalanced ? Thank you for your reply. On Fri, Sep 24, 2010 at 3:08 PM, Jeremy Fitzhardinge <jeremy@goop.org>wrote:> On 09/23/2010 09:22 PM, Jia Rao wrote: > > Hi all, > > > > I saw reproducible hangs in dom0 when the system is under heavy load. > > > > Testbed settings: > > four dom0s share a nfs server for domU images. a total number of 24 > > domUs (6 domUs on each dom0). When the system under heavy load, busy > > processing e-commerce requests, one or two of the dom0s hanged. no > > input can be accepted and reboot is necessary. > > Is the whole machine locked solid, or does it still, for example, > respond to ping on its external interfaces, capslock works on the > keyboard (if any), console echos characters? > > Does Xen still respond on the console (^A ^A ^A if you have a serial > console). > > > > > Anyone had the same experience? The causes I can come up are following: > > > > 1. nfs is not configured properly. But before I upgraded to xen 4, xen > > 3 worked pretty well. > > > > 2. the domU''s are using tap2 disk. Any similar problem in testing tap2? > > > > 3. Or the problem is from the new pvops kernel ? All the domU are cpu > > intensive and not generating a lot of IOs. > > > > Unfortunately, dom0''s dmesg and xm log recorded nothing about the hangs. > > > > FYI: > > > > Xen: 4.0.1-rc3-pre > > dom0: centos 2.6.32.1 pvops 8G, 8 cores > > Try disabling irqbalanced, which can cause lost events. > > J >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
irqbalanced was not turned on when the server hanged. On Fri, Sep 24, 2010 at 4:48 PM, Jia Rao <rickenrao@gmail.com> wrote:> Hi Jeremy, > > The whole machine was locked. No response to ping, local VGA display. > I did not try the serial console and will let you know once I try it. > > BTW. How to disable irqbalanced ? > > Thank you for your reply. > > On Fri, Sep 24, 2010 at 3:08 PM, Jeremy Fitzhardinge <jeremy@goop.org>wrote: > >> On 09/23/2010 09:22 PM, Jia Rao wrote: >> > Hi all, >> > >> > I saw reproducible hangs in dom0 when the system is under heavy load. >> > >> > Testbed settings: >> > four dom0s share a nfs server for domU images. a total number of 24 >> > domUs (6 domUs on each dom0). When the system under heavy load, busy >> > processing e-commerce requests, one or two of the dom0s hanged. no >> > input can be accepted and reboot is necessary. >> >> Is the whole machine locked solid, or does it still, for example, >> respond to ping on its external interfaces, capslock works on the >> keyboard (if any), console echos characters? >> >> Does Xen still respond on the console (^A ^A ^A if you have a serial >> console). >> >> > >> > Anyone had the same experience? The causes I can come up are following: >> > >> > 1. nfs is not configured properly. But before I upgraded to xen 4, xen >> > 3 worked pretty well. >> > >> > 2. the domU''s are using tap2 disk. Any similar problem in testing tap2? >> > >> > 3. Or the problem is from the new pvops kernel ? All the domU are cpu >> > intensive and not generating a lot of IOs. >> > >> > Unfortunately, dom0''s dmesg and xm log recorded nothing about the hangs. >> > >> > FYI: >> > >> > Xen: 4.0.1-rc3-pre >> > dom0: centos 2.6.32.1 pvops 8G, 8 cores >> >> Try disabling irqbalanced, which can cause lost events. >> >> J >> > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 24.09.2010 06:22, Jia Rao wrote:> I saw reproducible hangs in dom0 when the system is under heavy load. > four dom0s share a nfs server for domU images. a total number of 24 domUs (6 > domUs on each dom0). When the system under heavy load, busy processing > e-commerce requests, one or two of the dom0s hanged. no input can be > accepted and reboot is necessary. > Anyone had the same experience? The causes I can come up are following:Please post your hardware (mainboard, chipset, CPU, RAID controller). I have found a severe problem on Lynnfield systems. Regards Andreas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Hi Andreas, FYI. Server Model: Dell PowerEdge 1950 III Motherboard: do not actually know. CPU: Intel Xeon E5450 Hard drive controller: No RAID. SAS 6/i R integrated controller. Thanks. On Sat, Sep 25, 2010 at 8:12 AM, Andreas Kinzler <ml-xen-users@hfp.de>wrote:> On 24.09.2010 06:22, Jia Rao wrote: > >> I saw reproducible hangs in dom0 when the system is under heavy load. >> four dom0s share a nfs server for domU images. a total number of 24 domUs >> (6 >> domUs on each dom0). When the system under heavy load, busy processing >> e-commerce requests, one or two of the dom0s hanged. no input can be >> accepted and reboot is necessary. >> Anyone had the same experience? The causes I can come up are following: >> > > Please post your hardware (mainboard, chipset, CPU, RAID controller). > I have found a severe problem on Lynnfield systems. > > Regards Andreas >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Kinzler
2010-Sep-27 08:17 UTC
Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
On 26.09.2010 03:12, Jia Rao wrote:>> Please post your hardware (mainboard, chipset, CPU, RAID controller). >> I have found a severe problem on Lynnfield systems. > Server Model: Dell PowerEdge 1950 III > Motherboard: do not actually know. > CPU: Intel Xeon E5450 > Hard drive controller: No RAID. SAS 6/i R integrated controller.OK. This is no Nehalem based system. Are you using C3 anyway? Please post output of "xenpm start 10". Regards Andreas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Jia Rao
2010-Sep-27 13:41 UTC
Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
The following is the output of "xenpm start 10". CPU0: Residency(ms) Avg Res(ms) C0 100 ( 1.01%) 0.02 C1 10 ( 0.10%) 0.12 C2 9892 (98.89%) 2.43 C3 0 ( 0.00%) 0.00 P0 0 ( 0.00%) P1 0 ( 0.00%) P2 0 ( 0.00%) P3 54 (100.00%) Avg freq 1980000 KHz CPU1: Residency(ms) Avg Res(ms) C0 136 ( 1.37%) 0.02 C1 11 ( 0.11%) 0.19 C2 9855 (98.52%) 1.62 C3 0 ( 0.00%) 0.00 P0 0 ( 0.00%) P1 0 ( 0.00%) P2 0 ( 0.00%) P3 72 (100.00%) Avg freq 1980000 KHz CPU2: Residency(ms) Avg Res(ms) C0 153 ( 1.53%) 0.02 C1 58 ( 0.59%) 0.35 C2 9791 (97.88%) 1.43 C3 0 ( 0.00%) 0.00 P0 0 ( 0.00%) P1 0 ( 0.00%) P2 0 ( 0.00%) P3 85 (100.00%) Avg freq 1980000 KHz CPU3: Residency(ms) Avg Res(ms) C0 177 ( 1.77%) 0.02 C1 33 ( 0.34%) 0.09 C2 9792 (97.89%) 1.05 C3 0 ( 0.00%) 0.00 P0 0 ( 0.00%) P1 0 ( 0.00%) P2 0 ( 0.00%) P3 82 (100.00%) Avg freq 1980000 KHz CPU4: Residency(ms) Avg Res(ms) C0 166 ( 1.67%) 0.01 C1 947 ( 9.47%) 0.21 C2 8889 (88.86%) 0.72 C3 0 ( 0.00%) 0.00 P0 0 ( 0.00%) P1 0 ( 0.00%) P2 0 ( 0.00%) P3 11 (100.00%) Avg freq 1980000 KHz CPU5: Residency(ms) Avg Res(ms) C0 722 ( 7.23%) 0.04 C1 181 ( 1.81%) 0.09 C2 9098 (90.96%) 0.53 C3 0 ( 0.00%) 0.00 P0 0 ( 0.00%) P1 0 ( 0.00%) P2 0 ( 0.00%) P3 529 (100.00%) Avg freq 1980000 KHz CPU6: Residency(ms) Avg Res(ms) C0 73 ( 0.73%) 0.02 C1 5 ( 0.06%) 0.16 C2 9923 (99.21%) 2.44 C3 0 ( 0.00%) 0.00 P0 0 ( 0.00%) P1 0 ( 0.00%) P2 0 ( 0.00%) P3 27 (100.00%) Avg freq 1980000 KHz CPU7: Residency(ms) Avg Res(ms) C0 135 ( 1.35%) 0.02 C1 68 ( 0.68%) 0.27 C2 9799 (97.97%) 1.55 C3 0 ( 0.00%) 0.00 P0 0 ( 0.00%) P1 0 ( 0.00%) P2 0 ( 0.00%) P3 71 (100.00%) Avg freq 1980000 KHz Thanks On Mon, Sep 27, 2010 at 4:17 AM, Andreas Kinzler <ml-xen-devel@hfp.de>wrote:> On 26.09.2010 03:12, Jia Rao wrote: > >> Please post your hardware (mainboard, chipset, CPU, RAID controller). >>> I have found a severe problem on Lynnfield systems. >>> >> Server Model: Dell PowerEdge 1950 III >> >> Motherboard: do not actually know. >> CPU: Intel Xeon E5450 >> Hard drive controller: No RAID. SAS 6/i R integrated controller. >> > > OK. This is no Nehalem based system. Are you using C3 anyway? Please post > output of "xenpm start 10". > > Regards Andreas >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
On Sat, Sep 25, 2010 at 5:12 AM, Andreas Kinzler <ml-xen-users@hfp.de>wrote:> On 24.09.2010 06:22, Jia Rao wrote: > >> I saw reproducible hangs in dom0 when the system is under heavy load. >> four dom0s share a nfs server for domU images. a total number of 24 domUs >> (6 >> domUs on each dom0). When the system under heavy load, busy processing >> e-commerce requests, one or two of the dom0s hanged. no input can be >> accepted and reboot is necessary. >> Anyone had the same experience? The causes I can come up are following: >> > > Please post your hardware (mainboard, chipset, CPU, RAID controller). > I have found a severe problem on Lynnfield systems. >Andreas, Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel is causing grief for us too. I was wondering if this was related. -Bruce> > Regards Andreas > > > _______________________________________________ > Xen-users mailing list > Xen-users@lists.xensource.com > http://lists.xensource.com/xen-users >_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Andreas Kinzler
2010-Sep-27 14:22 UTC
[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
On 27.09.2010 16:06, Bruce Edge wrote:>>> I saw reproducible hangs in dom0 when the system is under heavy load. >>> four dom0s share a nfs server for domU images. a total number of 24 domUs >>> (6 >>> domUs on each dom0). When the system under heavy load, busy processing >>> e-commerce requests, one or two of the dom0s hanged. no input can be >>> accepted and reboot is necessary. >>> Anyone had the same experience? The causes I can come up are following: >> Please post your hardware (mainboard, chipset, CPU, RAID controller). >> I have found a severe problem on Lynnfield systems. > Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel is > causing grief for us too. I was wondering if this was related.I am still researching this. For testing I bought a test system with Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while Intel still lists it as having the C6 errata. This leads me to the conclusion that the HPET timer migration code (called HPET broadcast) from Xen is the root cause. This affects all CPUs that use it - but mainly Nehalem because of turbo mode. Regards Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Bruce Edge
2010-Sep-27 14:32 UTC
[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
On Mon, Sep 27, 2010 at 7:22 AM, Andreas Kinzler <ml-xen-users@hfp.de>wrote:> On 27.09.2010 16:06, Bruce Edge wrote: > >> I saw reproducible hangs in dom0 when the system is under heavy load. >>>> four dom0s share a nfs server for domU images. a total number of 24 >>>> domUs >>>> (6 >>>> domUs on each dom0). When the system under heavy load, busy processing >>>> e-commerce requests, one or two of the dom0s hanged. no input can be >>>> accepted and reboot is necessary. >>>> Anyone had the same experience? The causes I can come up are following: >>>> >>> Please post your hardware (mainboard, chipset, CPU, RAID controller). >>> I have found a severe problem on Lynnfield systems. >>> >> Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel >> is >> >> causing grief for us too. I was wondering if this was related. >> > > I am still researching this. For testing I bought a test system with > Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while > Intel still lists it as having the C6 errata. This leads me to the > conclusion that the HPET timer migration code (called HPET broadcast) from > Xen is the root cause. This affects all CPUs that use it - but mainly > Nehalem because of turbo mode. > > Regards Andreas >Andreas, Thanks for the info. I''ll try disabling turbo mode in the BIOS and see if that helps. Let me know if there''s anything I can run/do/test/etc. -Bruce _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andreas Kinzler
2010-Sep-27 14:50 UTC
Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
On 27.09.2010 16:32, Bruce Edge wrote:>>> Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel >>> is causing grief for us too. I was wondering if this was related. >> I am still researching this. For testing I bought a test system with >> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while >> Intel still lists it as having the C6 errata. This leads me to the >> conclusion that the HPET timer migration code (called HPET broadcast) from >> Xen is the root cause. This affects all CPUs that use it - but mainly >> Nehalem because of turbo mode. > Thanks for the info. I''ll try disabling turbo mode in the BIOS and see if > that helps. > Let me know if there''s anything I can run/do/test/etc.If you want to check the issue I am referring to then you need to apply my patch from: http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html Do not modify the BIOS settings in any way. Regards Andreas _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Sep-28 01:57 UTC
RE: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
Andres, a question to your http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html mail, does your system has interrupt remapping enabled? Thanks --jyh>-----Original Message----- >From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Andreas Kinzler >Sent: Monday, September 27, 2010 10:51 PM >To: Bruce Edge >Cc: xen-devel@lists.xensource.com; xen-users@lists.xensource.com >Subject: Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre > >On 27.09.2010 16:32, Bruce Edge wrote: >>>> Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel >>>> is causing grief for us too. I was wondering if this was related. >>> I am still researching this. For testing I bought a test system with >>> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while >>> Intel still lists it as having the C6 errata. This leads me to the >>> conclusion that the HPET timer migration code (called HPET broadcast) from >>> Xen is the root cause. This affects all CPUs that use it - but mainly >>> Nehalem because of turbo mode. >> Thanks for the info. I''ll try disabling turbo mode in the BIOS and see if >> that helps. >> Let me know if there''s anything I can run/do/test/etc. > >If you want to check the issue I am referring to then you need to apply >my patch from: >http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html > >Do not modify the BIOS settings in any way. > >Regards Andreas > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Andreas Kinzler
2010-Sep-28 08:25 UTC
Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
On 28.09.2010 03:57, Jiang, Yunhong wrote:> Andres, a question to your http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html mail,> does your system has interrupt remapping enabled? If you mean CONFIG_INTR_REMAP, then no. Regards Andreas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Andreas Kinzler
2010-Sep-28 09:04 UTC
Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
On 27.09.2010 15:41, Jia Rao wrote:> The following is the output of "xenpm start 10". > CPU0: Residency(ms) Avg Res(ms) > C0 100 ( 1.01%) 0.02 > C1 10 ( 0.10%) 0.12 > C2 9892 (98.89%) 2.43You are using C2 intensively. Without "local_apic_timer_c2_ok" this uses the same C3 HPET migration code. I think it makes sense to try my patch from http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html Regards Andreas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bruce Edge
2010-Sep-28 16:04 UTC
Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
On Mon, Sep 27, 2010 at 7:50 AM, Andreas Kinzler <ml-xen-devel@hfp.de> wrote:> On 27.09.2010 16:32, Bruce Edge wrote: >>>> >>>> Does this affect all Nehalem chips or only the Lynnfields? The .21 >>>> kernel >>>> is causing grief for us too. I was wondering if this was related. >>> >>> I am still researching this. For testing I bought a test system with >>> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while >>> Intel still lists it as having the C6 errata. This leads me to the >>> conclusion that the HPET timer migration code (called HPET broadcast) >>> from >>> Xen is the root cause. This affects all CPUs that use it - but mainly >>> Nehalem because of turbo mode. >> >> Thanks for the info. I''ll try disabling turbo mode in the BIOS and see if >> that helps. >> Let me know if there''s anything I can run/do/test/etc. > > If you want to check the issue I am referring to then you need to apply my > patch from: > http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html > > Do not modify the BIOS settings in any way. > > Regards Andreas >Andreas, With this patch the dom0 hangs when I start the vm in pv mode. The hvm ISO based install was OK, but the pv mode runtime hung the dom0 shortly after the boot entry was selected from the VM''s grub menu. There was no output on the VM console. The dom0 console is printiing these: [38927.441493] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] [38992.941432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] [39058.441434] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] [39123.931432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] The Xen console is still responsive. I''ve attached the ''*'' output. And lastly, just for confirmation, this it the patch I applied: diff -urN xx/xen/arch/x86/hpet.c xen-4.0.1/xen/arch/x86/hpet.c --- xx/xen/arch/x86/hpet.c 2010-08-25 12:22:11.000000000 +0200 +++ xen-4.0.1/xen/arch/x86/hpet.c 2010-08-30 18:13:34.000000000 +0200 @@ -405,7 +405,7 @@ /* Only consider HPET timer with MSI support */ if ( !(cfg & HPET_TN_FSB_CAP) ) continue; - +if (1) continue; ch->flags = 0; ch->idx = i; @@ -703,8 +703,9 @@ int hpet_broadcast_is_available(void) { - return (legacy_hpet_event.event_handler == handle_hpet_broadcast - || num_hpets_used > 0); + /*return (legacy_hpet_event.event_handler == handle_hpet_broadcast + || num_hpets_used > 0);*/ + return 0; } int hpet_legacy_irq_tick(void) -Bruce _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Andreas Kinzler
2010-Sep-29 17:46 UTC
Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
On 28.09.2010 18:04, Bruce Edge wrote:>>>> I am still researching this. For testing I bought a test system with >>>> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while >>>> Intel still lists it as having the C6 errata. This leads me to the >>>> conclusion that the HPET timer migration code (called HPET broadcast) >>>> from >>>> Xen is the root cause. This affects all CPUs that use it - but mainly >>>> Nehalem because of turbo mode. >>> Thanks for the info. I''ll try disabling turbo mode in the BIOS and see if >>> that helps. >>> Let me know if there''s anything I can run/do/test/etc. >> If you want to check the issue I am referring to then you need to apply my >> patch from: >> http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html > Andreas, > With this patch the dom0 hangs when I start the vm in pv mode. The hvm > ISO based install was OK, but the pv mode runtime hung the dom0 > shortly after the boot entry was selected from the VM''s grub menu. > There was no output on the VM console. > The dom0 console is printiing these: > [38927.441493] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] > [38992.941432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] > [39058.441434] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] > [39123.931432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]Please try this dom0 kernel: http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=snapshot;h=e6b9b2cbca5093e8e38d3e314e2f6415ad951c60;sf=tgz I have also attached the kernel config for it that is working for me.> And lastly, just for confirmation, this it the patch I applied:Yes. This is the patch I mean. The kernel mentioned above and my patch gives me a system that works quite well on my machines. Regards Andreas _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Bruce Edge
2010-Sep-29 18:01 UTC
Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
On Wed, Sep 29, 2010 at 10:46 AM, Andreas Kinzler <ml-xen-devel@hfp.de> wrote:> On 28.09.2010 18:04, Bruce Edge wrote: >>>>> >>>>> I am still researching this. For testing I bought a test system with >>>>> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable >>>>> while >>>>> Intel still lists it as having the C6 errata. This leads me to the >>>>> conclusion that the HPET timer migration code (called HPET broadcast) >>>>> from >>>>> Xen is the root cause. This affects all CPUs that use it - but mainly >>>>> Nehalem because of turbo mode. >>>> >>>> Thanks for the info. I''ll try disabling turbo mode in the BIOS and see >>>> if >>>> that helps. >>>> Let me know if there''s anything I can run/do/test/etc. >>> >>> If you want to check the issue I am referring to then you need to apply >>> my >>> patch from: >>> http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html >> >> Andreas, >> With this patch the dom0 hangs when I start the vm in pv mode. The hvm >> ISO based install was OK, but the pv mode runtime hung the dom0 >> shortly after the boot entry was selected from the VM''s grub menu. >> There was no output on the VM console. >> The dom0 console is printiing these: >> [38927.441493] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] >> [38992.941432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] >> [39058.441434] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] >> [39123.931432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] > > Please try this dom0 kernel: > http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=snapshot;h=e6b9b2cbca5093e8e38d3e314e2f6415ad951c60;sf=tgzAndreas, Is that a 2.6.18 snapshot (as indicated by your .config name) ? That one worked for me too, it wasn''t until I started using 2.6.21 that I started having these problems. And, while it''s tempting to stick with .18, I need to track the active development for a number of reasons. Also, the hang I mentioned wasn''t related to your patch. The .21 hangs without it as well. Since .23 was just pushed out, I''ll retry with that, with and without your patch. Thanks for the help. -Bruce> > I have also attached the kernel config for it that is working for me. > >> And lastly, just for confirmation, this it the patch I applied: > > Yes. This is the patch I mean. > > The kernel mentioned above and my patch gives me a system that works quite > well on my machines. > > Regards Andreas >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel