thr3ads.net - Xen devel - [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Jia Rao

2010-Sep-24 04:22 UTC

[Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Hi all,

I saw reproducible hangs in dom0 when the system is under heavy load.

Testbed settings:
four dom0s share a nfs server for domU images. a total number of 24 domUs (6
domUs on each dom0). When the system under heavy load, busy processing
e-commerce requests, one or two of the dom0s hanged. no input can be
accepted and reboot is necessary.

Anyone had the same experience? The causes I can come up are following:

1. nfs is not configured properly. But before I upgraded to xen 4, xen 3
worked pretty well.

2. the domU''s are using tap2 disk. Any similar problem in testing tap2?

3. Or the problem is from the new pvops kernel ? All the domU are cpu
intensive and not generating a lot of IOs.

Unfortunately, dom0''s dmesg and xm log recorded nothing about the
hangs.

FYI:

Xen: 4.0.1-rc3-pre
dom0: centos 2.6.32.1 pvops 8G, 8 cores
domU: 2.6.18.8 PV kernel 1G, 4 VCPU
NFS server: 8G, 8 cores, 4-disk RAID 5 nfs version 3 over TCP, rw size 4K
Interconnect: Gigabyte Ethernet.

Thanks a lot !


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Pasi Kärkkäinen

2010-Sep-24 10:27 UTC

head link

[Xen-users] Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre

On Fri, Sep 24, 2010 at 12:22:21AM -0400, Jia Rao wrote:>    Hi all,
>    I saw reproducible hangs in dom0 when the system is under heavy load.
>    Testbed settings:
>    four dom0s share a nfs server for domU images. a total number of 24
domUs
>    (6 domUs on each dom0). When the system under heavy load, busy
processing
>    e-commerce requests, one or two of the dom0s hanged. no input can be
>    accepted and reboot is necessary.
>    Anyone had the same experience? The causes I can come up are following:
>    1. nfs is not configured properly. But before I upgraded to xen 4, xen 3
>    worked pretty well.
>    2. the domU''s are using tap2 disk. Any similar problem in
testing tap2?
>    3. Or the problem is from the new pvops kernel ? All the domU are cpu
>    intensive and not generating a lot of IOs.
>    Unfortunately, dom0''s dmesg and xm log recorded nothing about
the hangs.
>    FYI:
>    Xen: 4.0.1-rc3-pre
>    dom0: centos 2.6.32.1 pvops 8G, 8 cores
>    domU: 2.6.18.8 PV kernel 1G, 4 VCPU
>    NFS server: 8G, 8 cores, 4-disk RAID 5 nfs version 3 over TCP, rw size
4K
>    Interconnect: Gigabyte Ethernet.
>    Thanks a lot !
Well first of all test with Xen 4.0.1 final, does that help? If not, try
Xen 4.0.2-rc1-pre, which has some additional IRQ related fixes.

-- Pasi


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Jeremy Fitzhardinge

2010-Sep-24 19:08 UTC

head link

[Xen-users] Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre

On 09/23/2010 09:22 PM, Jia Rao wrote:> Hi all,
>
> I saw reproducible hangs in dom0 when the system is under heavy load.
>
> Testbed settings:
> four dom0s share a nfs server for domU images. a total number of 24
> domUs (6 domUs on each dom0). When the system under heavy load, busy
> processing e-commerce requests, one or two of the dom0s hanged. no
> input can be accepted and reboot is necessary.
Is the whole machine locked solid, or does it still, for example,
respond to ping on its external interfaces, capslock works on the
keyboard (if any), console echos characters?

Does Xen still respond on the console (^A ^A ^A if you have a serial
console).
>
> Anyone had the same experience? The causes I can come up are following:
>
> 1. nfs is not configured properly. But before I upgraded to xen 4, xen
> 3 worked pretty well.
>
> 2. the domU''s are using tap2 disk. Any similar problem in testing
tap2?
>
> 3. Or the problem is from the new pvops kernel ? All the domU are cpu
> intensive and not generating a lot of IOs.
>
> Unfortunately, dom0''s dmesg and xm log recorded nothing about the
hangs.
>
> FYI:
>
> Xen: 4.0.1-rc3-pre
> dom0: centos 2.6.32.1 pvops 8G, 8 cores
Try disabling irqbalanced, which can cause lost events.

    J

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Jia Rao

2010-Sep-24 20:48 UTC

head link

Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre

Hi Jeremy,

The whole machine was locked. No response to ping, local VGA display.
I did not try the serial console and will let you know once I try it.

BTW. How to disable irqbalanced ?

Thank you for your reply.

On Fri, Sep 24, 2010 at 3:08 PM, Jeremy Fitzhardinge
<jeremy@goop.org>wrote:
>  On 09/23/2010 09:22 PM, Jia Rao wrote:
> > Hi all,
> >
> > I saw reproducible hangs in dom0 when the system is under heavy load.
> >
> > Testbed settings:
> > four dom0s share a nfs server for domU images. a total number of 24
> > domUs (6 domUs on each dom0). When the system under heavy load, busy
> > processing e-commerce requests, one or two of the dom0s hanged. no
> > input can be accepted and reboot is necessary.
>
> Is the whole machine locked solid, or does it still, for example,
> respond to ping on its external interfaces, capslock works on the
> keyboard (if any), console echos characters?
>
> Does Xen still respond on the console (^A ^A ^A if you have a serial
> console).
>
> >
> > Anyone had the same experience? The causes I can come up are
following:
> >
> > 1. nfs is not configured properly. But before I upgraded to xen 4, xen
> > 3 worked pretty well.
> >
> > 2. the domU''s are using tap2 disk. Any similar problem in
testing tap2?
> >
> > 3. Or the problem is from the new pvops kernel ? All the domU are cpu
> > intensive and not generating a lot of IOs.
> >
> > Unfortunately, dom0''s dmesg and xm log recorded nothing about
the hangs.
> >
> > FYI:
> >
> > Xen: 4.0.1-rc3-pre
> > dom0: centos 2.6.32.1 pvops 8G, 8 cores
>
> Try disabling irqbalanced, which can cause lost events.
>
>    J
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jia Rao

2010-Sep-24 21:01 UTC

head link

Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre

irqbalanced was not turned on when the server hanged.

On Fri, Sep 24, 2010 at 4:48 PM, Jia Rao <rickenrao@gmail.com> wrote:
> Hi Jeremy,
>
> The whole machine was locked. No response to ping, local VGA display.
> I did not try the serial console and will let you know once I try it.
>
> BTW. How to disable irqbalanced ?
>
> Thank you for your reply.
>
> On Fri, Sep 24, 2010 at 3:08 PM, Jeremy Fitzhardinge
<jeremy@goop.org>wrote:
>
>>  On 09/23/2010 09:22 PM, Jia Rao wrote:
>> > Hi all,
>> >
>> > I saw reproducible hangs in dom0 when the system is under heavy
load.
>> >
>> > Testbed settings:
>> > four dom0s share a nfs server for domU images. a total number of
24
>> > domUs (6 domUs on each dom0). When the system under heavy load,
busy
>> > processing e-commerce requests, one or two of the dom0s hanged. no
>> > input can be accepted and reboot is necessary.
>>
>> Is the whole machine locked solid, or does it still, for example,
>> respond to ping on its external interfaces, capslock works on the
>> keyboard (if any), console echos characters?
>>
>> Does Xen still respond on the console (^A ^A ^A if you have a serial
>> console).
>>
>> >
>> > Anyone had the same experience? The causes I can come up are
following:
>> >
>> > 1. nfs is not configured properly. But before I upgraded to xen 4,
xen
>> > 3 worked pretty well.
>> >
>> > 2. the domU''s are using tap2 disk. Any similar problem in
testing tap2?
>> >
>> > 3. Or the problem is from the new pvops kernel ? All the domU are
cpu
>> > intensive and not generating a lot of IOs.
>> >
>> > Unfortunately, dom0''s dmesg and xm log recorded nothing
about the hangs.
>> >
>> > FYI:
>> >
>> > Xen: 4.0.1-rc3-pre
>> > dom0: centos 2.6.32.1 pvops 8G, 8 cores
>>
>> Try disabling irqbalanced, which can cause lost events.
>>
>>    J
>>
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andreas Kinzler

2010-Sep-25 12:12 UTC

head link

Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On 24.09.2010 06:22, Jia Rao wrote:> I saw reproducible hangs in dom0 when the system is under heavy load.
> four dom0s share a nfs server for domU images. a total number of 24 domUs
(6
> domUs on each dom0). When the system under heavy load, busy processing
> e-commerce requests, one or two of the dom0s hanged. no input can be
> accepted and reboot is necessary.
> Anyone had the same experience? The causes I can come up are following:
Please post your hardware (mainboard, chipset, CPU, RAID controller).
I have found a severe problem on Lynnfield systems.

Regards Andreas

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Jia Rao

2010-Sep-26 01:12 UTC

head link

[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Hi Andreas,

FYI.
Server Model: Dell PowerEdge 1950 III
Motherboard: do not actually know.
CPU: Intel Xeon E5450
Hard drive controller: No RAID. SAS 6/i R integrated controller.

Thanks.

On Sat, Sep 25, 2010 at 8:12 AM, Andreas Kinzler
<ml-xen-users@hfp.de>wrote:
> On 24.09.2010 06:22, Jia Rao wrote:
>
>> I saw reproducible hangs in dom0 when the system is under heavy load.
>> four dom0s share a nfs server for domU images. a total number of 24
domUs
>> (6
>> domUs on each dom0). When the system under heavy load, busy processing
>> e-commerce requests, one or two of the dom0s hanged. no input can be
>> accepted and reboot is necessary.
>> Anyone had the same experience? The causes I can come up are following:
>>
>
> Please post your hardware (mainboard, chipset, CPU, RAID controller).
> I have found a severe problem on Lynnfield systems.
>
> Regards Andreas
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andreas Kinzler

2010-Sep-27 08:17 UTC

head link

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On 26.09.2010 03:12, Jia Rao wrote:>> Please post your hardware (mainboard, chipset, CPU, RAID controller).
>> I have found a severe problem on Lynnfield systems.
> Server Model: Dell PowerEdge 1950 III
> Motherboard: do not actually know.
> CPU: Intel Xeon E5450
> Hard drive controller: No RAID. SAS 6/i R integrated controller.
OK. This is no Nehalem based system. Are you using C3 anyway? Please 
post output of "xenpm start 10".

Regards Andreas

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Jia Rao

2010-Sep-27 13:41 UTC

head link

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

The following is the output of "xenpm start 10".

CPU0:   Residency(ms)           Avg Res(ms)
  C0    100     ( 1.01%)        0.02
  C1    10      ( 0.10%)        0.12
  C2    9892    (98.89%)        2.43
  C3    0       ( 0.00%)        0.00

  P0    0       ( 0.00%)
  P1    0       ( 0.00%)
  P2    0       ( 0.00%)
  P3    54      (100.00%)
  Avg freq      1980000 KHz

CPU1:   Residency(ms)           Avg Res(ms)
  C0    136     ( 1.37%)        0.02
  C1    11      ( 0.11%)        0.19
  C2    9855    (98.52%)        1.62
  C3    0       ( 0.00%)        0.00

  P0    0       ( 0.00%)
  P1    0       ( 0.00%)
  P2    0       ( 0.00%)
  P3    72      (100.00%)
  Avg freq      1980000 KHz

CPU2:   Residency(ms)           Avg Res(ms)
  C0    153     ( 1.53%)        0.02
  C1    58      ( 0.59%)        0.35
  C2    9791    (97.88%)        1.43
  C3    0       ( 0.00%)        0.00

  P0    0       ( 0.00%)
  P1    0       ( 0.00%)
  P2    0       ( 0.00%)
  P3    85      (100.00%)
  Avg freq      1980000 KHz

CPU3:   Residency(ms)           Avg Res(ms)
  C0    177     ( 1.77%)        0.02
  C1    33      ( 0.34%)        0.09
  C2    9792    (97.89%)        1.05
  C3    0       ( 0.00%)        0.00

  P0    0       ( 0.00%)
  P1    0       ( 0.00%)
  P2    0       ( 0.00%)
  P3    82      (100.00%)
  Avg freq      1980000 KHz

CPU4:   Residency(ms)           Avg Res(ms)
  C0    166     ( 1.67%)        0.01
  C1    947     ( 9.47%)        0.21
  C2    8889    (88.86%)        0.72
  C3    0       ( 0.00%)        0.00

  P0    0       ( 0.00%)
  P1    0       ( 0.00%)
  P2    0       ( 0.00%)
  P3    11      (100.00%)
  Avg freq      1980000 KHz

CPU5:   Residency(ms)           Avg Res(ms)
  C0    722     ( 7.23%)        0.04
  C1    181     ( 1.81%)        0.09
  C2    9098    (90.96%)        0.53
  C3    0       ( 0.00%)        0.00

  P0    0       ( 0.00%)
  P1    0       ( 0.00%)
  P2    0       ( 0.00%)
  P3    529     (100.00%)
  Avg freq      1980000 KHz

CPU6:   Residency(ms)           Avg Res(ms)
  C0    73      ( 0.73%)        0.02
  C1    5       ( 0.06%)        0.16
  C2    9923    (99.21%)        2.44
  C3    0       ( 0.00%)        0.00

  P0    0       ( 0.00%)
  P1    0       ( 0.00%)
  P2    0       ( 0.00%)
  P3    27      (100.00%)
  Avg freq      1980000 KHz

CPU7:   Residency(ms)           Avg Res(ms)
  C0    135     ( 1.35%)        0.02
  C1    68      ( 0.68%)        0.27
  C2    9799    (97.97%)        1.55
  C3    0       ( 0.00%)        0.00

  P0    0       ( 0.00%)
  P1    0       ( 0.00%)
  P2    0       ( 0.00%)
  P3    71      (100.00%)
  Avg freq      1980000 KHz

Thanks

On Mon, Sep 27, 2010 at 4:17 AM, Andreas Kinzler
<ml-xen-devel@hfp.de>wrote:
> On 26.09.2010 03:12, Jia Rao wrote:
>
>> Please post your hardware (mainboard, chipset, CPU, RAID controller).
>>> I have found a severe problem on Lynnfield systems.
>>>
>> Server Model: Dell PowerEdge 1950 III
>>
>> Motherboard: do not actually know.
>> CPU: Intel Xeon E5450
>> Hard drive controller: No RAID. SAS 6/i R integrated controller.
>>
>
> OK. This is no Nehalem based system. Are you using C3 anyway? Please post
> output of "xenpm start 10".
>
> Regards Andreas
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Bruce Edge

2010-Sep-27 14:06 UTC

head link

Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On Sat, Sep 25, 2010 at 5:12 AM, Andreas Kinzler
<ml-xen-users@hfp.de>wrote:
> On 24.09.2010 06:22, Jia Rao wrote:
>
>> I saw reproducible hangs in dom0 when the system is under heavy load.
>> four dom0s share a nfs server for domU images. a total number of 24
domUs
>> (6
>> domUs on each dom0). When the system under heavy load, busy processing
>> e-commerce requests, one or two of the dom0s hanged. no input can be
>> accepted and reboot is necessary.
>> Anyone had the same experience? The causes I can come up are following:
>>
>
> Please post your hardware (mainboard, chipset, CPU, RAID controller).
> I have found a severe problem on Lynnfield systems.
>
Andreas,

Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel is
causing grief for us too.  I was wondering if this was related.

-Bruce
>
> Regards Andreas
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xensource.com
> http://lists.xensource.com/xen-users
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Andreas Kinzler

2010-Sep-27 14:22 UTC

head link

[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On 27.09.2010 16:06, Bruce Edge wrote:>>> I saw reproducible hangs in dom0 when the system is under heavy
load.
>>> four dom0s share a nfs server for domU images. a total number of 24
domUs
>>> (6
>>> domUs on each dom0). When the system under heavy load, busy
processing
>>> e-commerce requests, one or two of the dom0s hanged. no input can
be
>>> accepted and reboot is necessary.
>>> Anyone had the same experience? The causes I can come up are
following:
>> Please post your hardware (mainboard, chipset, CPU, RAID controller).
>> I have found a severe problem on Lynnfield systems.
> Does this affect all Nehalem chips or only the Lynnfields? The .21 kernel
is
> causing grief for us too.  I was wondering if this was related.
I am still researching this. For testing I bought a test system with 
Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while 
Intel still lists it as having the C6 errata. This leads me to the 
conclusion that the HPET timer migration code (called HPET broadcast) 
from Xen is the root cause. This affects all CPUs that use it - but 
mainly Nehalem because of turbo mode.

Regards Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Bruce Edge

2010-Sep-27 14:32 UTC

head link

[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On Mon, Sep 27, 2010 at 7:22 AM, Andreas Kinzler
<ml-xen-users@hfp.de>wrote:
> On 27.09.2010 16:06, Bruce Edge wrote:
>
>> I saw reproducible hangs in dom0 when the system is under heavy load.
>>>> four dom0s share a nfs server for domU images. a total number
of 24
>>>> domUs
>>>> (6
>>>> domUs on each dom0). When the system under heavy load, busy
processing
>>>> e-commerce requests, one or two of the dom0s hanged. no input
can be
>>>> accepted and reboot is necessary.
>>>> Anyone had the same experience? The causes I can come up are
following:
>>>>
>>> Please post your hardware (mainboard, chipset, CPU, RAID
controller).
>>> I have found a severe problem on Lynnfield systems.
>>>
>> Does this affect all Nehalem chips or only the Lynnfields? The .21
kernel
>> is
>>
>> causing grief for us too.  I was wondering if this was related.
>>
>
> I am still researching this. For testing I bought a test system with
> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable while
> Intel still lists it as having the C6 errata. This leads me to the
> conclusion that the HPET timer migration code (called HPET broadcast) from
> Xen is the root cause. This affects all CPUs that use it - but mainly
> Nehalem because of turbo mode.
>
> Regards Andreas
>
Andreas,
Thanks for the info. I''ll try disabling turbo mode in the BIOS and see
if
that helps.
Let me know if there''s anything I can run/do/test/etc.

-Bruce


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andreas Kinzler

2010-Sep-27 14:50 UTC

head link

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On 27.09.2010 16:32, Bruce Edge wrote:>>> Does this affect all Nehalem chips or only the Lynnfields? The .21
kernel
>>> is  causing grief for us too.  I was wondering if this was related.
>> I am still researching this. For testing I bought a test system with
>> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable
while
>> Intel still lists it as having the C6 errata. This leads me to the
>> conclusion that the HPET timer migration code (called HPET broadcast)
from
>> Xen is the root cause. This affects all CPUs that use it - but mainly
>> Nehalem because of turbo mode.
> Thanks for the info. I''ll try disabling turbo mode in the BIOS and
see if
> that helps.
> Let me know if there''s anything I can run/do/test/etc.
If you want to check the issue I am referring to then you need to apply 
my patch from: 
http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html

Do not modify the BIOS settings in any way.

Regards Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Sep-28 01:57 UTC

head link

RE: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Andres, a question to your
http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html mail,
does your system has interrupt remapping enabled?

Thanks
--jyh
>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com
>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Andreas Kinzler
>Sent: Monday, September 27, 2010 10:51 PM
>To: Bruce Edge
>Cc: xen-devel@lists.xensource.com; xen-users@lists.xensource.com
>Subject: Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre
>
>On 27.09.2010 16:32, Bruce Edge wrote:
>>>> Does this affect all Nehalem chips or only the Lynnfields? The
.21 kernel
>>>> is  causing grief for us too.  I was wondering if this was
related.
>>> I am still researching this. For testing I bought a test system
with
>>> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable
while
>>> Intel still lists it as having the C6 errata. This leads me to the
>>> conclusion that the HPET timer migration code (called HPET
broadcast) from
>>> Xen is the root cause. This affects all CPUs that use it - but
mainly
>>> Nehalem because of turbo mode.
>> Thanks for the info. I''ll try disabling turbo mode in the BIOS
and see if
>> that helps.
>> Let me know if there''s anything I can run/do/test/etc.
>
>If you want to check the issue I am referring to then you need to apply
>my patch from:
>http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html
>
>Do not modify the BIOS settings in any way.
>
>Regards Andreas
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Andreas Kinzler

2010-Sep-28 08:25 UTC

head link

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On 28.09.2010 03:57, Jiang, Yunhong wrote:> Andres, a question to your
http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html mail, > does your system has interrupt remapping enabled?

If you mean CONFIG_INTR_REMAP, then no.

Regards Andreas

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Andreas Kinzler

2010-Sep-28 09:04 UTC

head link

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On 27.09.2010 15:41, Jia Rao wrote:> The following is the output of "xenpm start 10".
> CPU0:   Residency(ms)           Avg Res(ms)
>    C0    100     ( 1.01%)        0.02
>    C1    10      ( 0.10%)        0.12
>    C2    9892    (98.89%)        2.43
You are using C2 intensively. Without "local_apic_timer_c2_ok" this
uses
the same C3 HPET migration code. I think it makes sense to try my patch 
from 
http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html

Regards Andreas

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Bruce Edge

2010-Sep-28 16:04 UTC

head link

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On Mon, Sep 27, 2010 at 7:50 AM, Andreas Kinzler <ml-xen-devel@hfp.de>
wrote:> On 27.09.2010 16:32, Bruce Edge wrote:
>>>>
>>>> Does this affect all Nehalem chips or only the Lynnfields? The
.21
>>>> kernel
>>>> is  causing grief for us too.  I was wondering if this was
related.
>>>
>>> I am still researching this. For testing I bought a test system
with
>>> Westmere-EP (Xeon E5620) which has ARAT. This system worked stable
while
>>> Intel still lists it as having the C6 errata. This leads me to the
>>> conclusion that the HPET timer migration code (called HPET
broadcast)
>>> from
>>> Xen is the root cause. This affects all CPUs that use it - but
mainly
>>> Nehalem because of turbo mode.
>>
>> Thanks for the info. I''ll try disabling turbo mode in the BIOS
and see if
>> that helps.
>> Let me know if there''s anything I can run/do/test/etc.
>
> If you want to check the issue I am referring to then you need to apply my
> patch from:
> http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html
>
> Do not modify the BIOS settings in any way.
>
> Regards Andreas
>
Andreas,
With this patch the dom0 hangs when I start the vm in pv mode. The hvm
ISO based install was OK, but the pv mode runtime hung the dom0
shortly after the boot entry was selected from the VM''s grub menu.
There was no output on the VM console.
The dom0 console is printiing these:
[38927.441493] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
[38992.941432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
[39058.441434] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
[39123.931432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]

The Xen console is still responsive.
I''ve attached the ''*'' output.

And lastly, just for confirmation, this it the patch I applied:

diff -urN xx/xen/arch/x86/hpet.c xen-4.0.1/xen/arch/x86/hpet.c
--- xx/xen/arch/x86/hpet.c      2010-08-25 12:22:11.000000000 +0200
+++ xen-4.0.1/xen/arch/x86/hpet.c       2010-08-30 18:13:34.000000000 +0200
@@ -405,7 +405,7 @@
         /* Only consider HPET timer with MSI support */
         if ( !(cfg & HPET_TN_FSB_CAP) )
             continue;
-
+if (1) continue;
         ch->flags = 0;
         ch->idx = i;

@@ -703,8 +703,9 @@

 int hpet_broadcast_is_available(void)
 {
-    return (legacy_hpet_event.event_handler == handle_hpet_broadcast
-            || num_hpets_used > 0);
+    /*return (legacy_hpet_event.event_handler == handle_hpet_broadcast
+            || num_hpets_used > 0);*/
+    return 0;
 }

 int hpet_legacy_irq_tick(void)


-Bruce


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Andreas Kinzler

2010-Sep-29 17:46 UTC

head link

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On 28.09.2010 18:04, Bruce Edge wrote:>>>> I am still researching this. For testing I bought a test system
with
>>>> Westmere-EP (Xeon E5620) which has ARAT. This system worked
stable while
>>>> Intel still lists it as having the C6 errata. This leads me to
the
>>>> conclusion that the HPET timer migration code (called HPET
broadcast)
>>>> from
>>>> Xen is the root cause. This affects all CPUs that use it - but
mainly
>>>> Nehalem because of turbo mode.
>>> Thanks for the info. I''ll try disabling turbo mode in the
BIOS and see if
>>> that helps.
>>> Let me know if there''s anything I can run/do/test/etc.
>> If you want to check the issue I am referring to then you need to apply
my
>> patch from:
>>
http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html
> Andreas,
> With this patch the dom0 hangs when I start the vm in pv mode. The hvm
> ISO based install was OK, but the pv mode runtime hung the dom0
> shortly after the boot entry was selected from the VM''s grub menu.
> There was no output on the VM console.
> The dom0 console is printiing these:
> [38927.441493] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
> [38992.941432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
> [39058.441434] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
> [39123.931432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
Please try this dom0 kernel: 
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=snapshot;h=e6b9b2cbca5093e8e38d3e314e2f6415ad951c60;sf=tgz

I have also attached the kernel config for it that is working for me.
> And lastly, just for confirmation, this it the patch I applied:
Yes. This is the patch I mean.

The kernel mentioned above and my patch gives me a system that works 
quite well on my machines.

Regards Andreas


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Bruce Edge

2010-Sep-29 18:01 UTC

head link

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

On Wed, Sep 29, 2010 at 10:46 AM, Andreas Kinzler <ml-xen-devel@hfp.de>
wrote:> On 28.09.2010 18:04, Bruce Edge wrote:
>>>>>
>>>>> I am still researching this. For testing I bought a test
system with
>>>>> Westmere-EP (Xeon E5620) which has ARAT. This system worked
stable
>>>>> while
>>>>> Intel still lists it as having the C6 errata. This leads me
to the
>>>>> conclusion that the HPET timer migration code (called HPET
broadcast)
>>>>> from
>>>>> Xen is the root cause. This affects all CPUs that use it -
but mainly
>>>>> Nehalem because of turbo mode.
>>>>
>>>> Thanks for the info. I''ll try disabling turbo mode in
the BIOS and see
>>>> if
>>>> that helps.
>>>> Let me know if there''s anything I can run/do/test/etc.
>>>
>>> If you want to check the issue I am referring to then you need to
apply
>>> my
>>> patch from:
>>>
http://lists.xensource.com/archives/html/xen-devel/2010-09/msg00556.html
>>
>> Andreas,
>> With this patch the dom0 hangs when I start the vm in pv mode. The hvm
>> ISO based install was OK, but the pv mode runtime hung the dom0
>> shortly after the boot entry was selected from the VM''s grub
menu.
>> There was no output on the VM console.
>> The dom0 console is printiing these:
>> [38927.441493] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>> [38992.941432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>> [39058.441434] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>> [39123.931432] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0]
>
> Please try this dom0 kernel:
>
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=snapshot;h=e6b9b2cbca5093e8e38d3e314e2f6415ad951c60;sf=tgz
Andreas,

Is that a 2.6.18 snapshot (as indicated by your .config name) ?

That one worked for me too, it wasn''t until I started using 2.6.21
that I started having these problems. And, while it''s tempting to
stick with .18, I need to track the active development for a number of
reasons.

Also, the hang I mentioned wasn''t related to your patch. The .21 hangs
without it as well.

Since .23 was just pushed out, I''ll retry with that, with and without
your patch.

Thanks for the help.

-Bruce
>
> I have also attached the kernel config for it that is working for me.
>
>> And lastly, just for confirmation, this it the patch I applied:
>
> Yes. This is the patch I mean.
>
> The kernel mentioned above and my patch gives me a system that works quite
> well on my machines.
>
> Regards Andreas
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Sep 2010 - dom0 hangs in xen 4.0.1-rc3-pre

[Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

[Xen-users] Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre

[Xen-users] Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

[Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

RE: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre

Re: [Xen-devel] Re: [Xen-users] dom0 hangs in xen 4.0.1-rc3-pre