thr3ads.net - Xen devel - [Xen-devel] Need help in debugging partially blocked hypervisor [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Dietmar Hahn

2009-Oct-21 13:07 UTC

[Xen-devel] Need help in debugging partially blocked hypervisor

Hi,

I need some help in debugging a strange hypervisor behavior together
with using fully virtualized performance counters.

For info I use SLES11, means xen-3.3.1 and linux-2.6.27.19-5... on a
Intel nehalem machine.
I tried the hypervisor from xen-unstable but the machine didn''t boot.

dom0 1 cpu
domU 2 cpu''s
3 cpu''s paused.

I start performance counter in domU and after some time the domU cpus
are running forever  (seeing with xm vcpu-list) and the domU is not accessible.
dom0 is still working like expected.
Serial console doesn''t react on 3xCTRL-A, but xm debug-keys prints
it''s output
on the serial console.
When I try to pause the domU (xm pause ...), using xenctx or some debug keys
where
the domU must get paused, the dom0 freezes and only a hard reset helps, what
seems to come from the call of vcpu_sleep_sync().

I tried xentrace while in the strange state and saw only loggings from the CPU0
(dom0 cpu), what means for me that the domU CPU''s are somewhere in the
hypervisor.

Attached is the output of "xm debug-keys d". I hope someone has an
idea about the
direction where I have to look deeper.

Many thanks in advance!
Dietmar.


(XEN) ''d'' pressed -> dumping registers
(XEN) *** Dumping CPU0 guest state (d0:v0): ***
(XEN) ----[ Xen-3.3.1  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff8020746a>]
(XEN) RFLAGS: 0000000000000216   EM: 0   CONTEXT: pv guest
(XEN) rax: 0000000000000023   rbx: ffffffff803c7505   rcx: ffffffff8020746a
(XEN) rdx: 00007fd955ef2f8a   rsi: 00007fd95635dc00   rdi: 00007fd946ff9170
(XEN) rbp: ffffffffffffffda   rsp: ffff8800da541dc0   r8:  00007fd956324390
(XEN) r9:  0000000000000002   r10: 0000000000000000   r11: 0000000000000216
(XEN) r12: ffff8800dbd42080   r13: ffff8800db4d5500   r14: 0000000000000000
(XEN) r15: 00007fd946ff9200   cr0: 0000000080050033   cr4: 00000000000026b0
(XEN) cr3: 000000025c880000   cr2: 00007fef4f880ad0
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffff8800da541dc0:
(XEN)    ffffffff80307263 ffffffff80207460 ffffffff803c7593 ffff8800ce4d7720
(XEN)    00000000da4fc067 ffff8800050a7180 ffffffff8028484c ffff8800ce1399c0
(XEN)    0000000000000003 0000000000000000 ffff8800ce1399c0 00007fd946ff9000
(XEN)    0000000000000000 0000000000000023 00007fd946ff9170 00007fd95635dc00
(XEN)    00007fd955ef2f8a 0000000000000000 00007fd956324390 ffff8800d9bc4780
(XEN)    0000000000000001 00007fd946ff9000 ffffffff803c7505 ffff8800dbd42100
(XEN)    ffff8800dbd42080 ffff8800db4d5500 0000000000000000 00007fd946ff9200
(XEN)    ffffffff802e0ae3 0030500046ff9000 ffff8800db4d5500 00007fd946ff9200
(XEN)    0000000000305000 0000000000000006 0000000000000006 00007fd956208608
(XEN)    ffffffff802aa8b5 ffff8800db4d5500 ffff8800db4d5500 00007fd946ff9200
(XEN)    ffffffff802aab22 0000000000001000 ffff8800dbde7520 00007fd946ff9000
(XEN)    0000000000000000 ffff8800db4d5500 00007fd946ff9200 0000000000305000
(XEN)    ffffffff802aab82 0000000000000006 0000000100000001 0000000000000000
(XEN)    0000000001ce0b34 0000000001c8eed0 0000000000000006 0000000000000001
(XEN)    ffffffff8020b3b8 0000000000000246 0000000000000000 0000000000000200
(XEN)    fffffffffffffffd 0000000000000010 ffffffff8020b350 00007fd946ff9200
(XEN)    0000000000305000 0000000000000006 0000000000000010 00007fd95536fb77
(XEN)    000000000000e033 0000000000000246 00007fd946ff9168 000000000000e02b
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 
(XEN) *** Dumping CPU1 host state: ***
(XEN) ----[ Xen-3.3.1  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff828c8013a24b>] default_idle+0x2b/0x40
(XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
(XEN) rax: 0000000000000080   rbx: ffff8300bf5f7f28   rcx: 0000000000000001
(XEN) rdx: ffff828c80276980   rsi: ffff828c8021ad40   rdi: 0000000000002000
(XEN) rbp: ffff8300bf5f7f28   rsp: ffff8300bf5f7f08   r8:  0000000000000002
(XEN) r9:  ffff8300be601e00   r10: 0000000000000000   r11: ffff8300be601e10
(XEN) r12: ffff828c80276980   r13: 00000014ef213474   r14: ffff828c8021a160
(XEN) r15: ffff828c8021a100   cr0: 000000008005003b   cr4: 00000000000026b0
(XEN) cr3: 00000000be864000   cr2: 00007fd946ff3ed0
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff8300bf5f7f08:
(XEN)    ffff828c8013e126 0000000000002000 ffff8300be6fc080 ffff8300be61c080
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000246 0000000000007ff0
(XEN)    ffff880080ad1000 ffff8800dd488000 0000000000000000 ffffffff8020730a
(XEN)    0000000000000000 0000000000000001 0000000000000002 0000010000000000
(XEN)    ffffffff8020730a 000000000000e033 0000000000000246 ffff8800dd489f28
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000001 ffff8300be6fc080
(XEN) Xen call trace:
(XEN)    [<ffff828c8013a24b>] default_idle+0x2b/0x40
(XEN)    [<ffff828c8013e126>] idle_loop+0xa6/0xd0
(XEN)    
(XEN) No guest context (CPU1 is idle).
(XEN) 
(XEN) *** Dumping CPU2 host state: ***
(XEN) ----[ Xen-3.3.1  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff828c8019a45c>] vmx_vmexit_handler+0x2ec/0x1b20
(XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
(XEN) rax: 0000000000000020   rbx: ffff8300be6e0080   rcx: 0000000000000000
(XEN) rdx: ffff828c8021c3a0   rsi: 00000000000003de   rdi: ffff8300be6f7f28
(XEN) rbp: ffff9700ffb80990   rsp: ffff8300be6f7e38   r8:  ffff97600036379c
(XEN) r9:  ffff9700ff428b5b   r10: ffff976000363794   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: ffff8300be6e0080   r14: ffff8300be6f7f28
(XEN) r15: ffff976000363958   cr0: 000000008005003b   cr4: 00000000000026b0
(XEN) cr3: 000000033fc01000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff8300be6f7e38:
(XEN)    0000000000000000 ffff8300be6e0080 ffff8300be6e1858 ffff8300be6e0080
(XEN)    ffff9700ffb80990 ffff828c80187141 000000000000e102 000000000000e102
(XEN)    00000000000000e1 ffff828c8019483d ffff8300be6ee102 ffff828c80137d6e
(XEN)    ffff8300be6f7f28 ffff8300be6e0080 0000000000000000 ffff8300be601f08
(XEN)    00000078be6edeea 0000000000000002 ffff8300be6f7f28 ffff828c8011b87a
(XEN)    ffff828c80276980 0000000000000002 ffff828c80277980 ffff8300be6e0080
(XEN)    ffff9700ffb80990 0000000000000000 ffff976000363958 ffffffffffffffff
(XEN)    ffff976000363958 ffff828c801944c3 ffff976000363958 ffffffffffffffff
(XEN)    ffff976000363958 0000000000000000 ffff9700ffb80990 0000000000000050
(XEN)    0000000000000000 ffff976000363794 ffff9700ff428b5b ffff97600036379c
(XEN)    0000000000000730 ffffb000000b8000 00000000000003de 00000000000003de
(XEN)    ffff9700ffb80990 000000000000000b ffff9700ff025250 0000000000000000
(XEN)    0000000000010097 ffff976000363938 0000000000000000 5555555555555555
(XEN)    5555555555555555 5555555555555555 5555555555555555 5555555500000002
(XEN)    ffff8300be6e0080
(XEN) Xen call trace:
(XEN)    [<ffff828c8019a45c>] vmx_vmexit_handler+0x2ec/0x1b20
(XEN)    [<ffff828c80187141>] hvm_vcpu_has_pending_irq+0x41/0x60
(XEN)    [<ffff828c8019483d>] vmx_intr_assist+0x2bd/0x490
(XEN)    [<ffff828c80137d6e>] reprogram_timer+0x1e/0x90
(XEN)    [<ffff828c8011b87a>] _spin_unlock_irq+0x1a/0x40
(XEN)    [<ffff828c801944c3>] vmx_asm_do_vmentry+0x0/0xbd
(XEN)    
(XEN) *** Dumping CPU2 guest state (d1:v1): ***
(XEN) ----[ Xen-3.3.1  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    2
(XEN) RIP:    0020:[<ffff9700ff025250>]
(XEN) RFLAGS: 0000000000010097   CONTEXT: hvm guest
(XEN) rax: 0000000000000730   rbx: 0000000000000050   rcx: ffffb000000b8000
(XEN) rdx: 00000000000003de   rsi: 00000000000003de   rdi: ffff9700ffb80990
(XEN) rbp: ffff9700ffb80990   rsp: ffff976000363938   r8:  ffff97600036379c
(XEN) r9:  ffff9700ff428b5b   r10: ffff976000363794   r11: 0000000000000000
(XEN) r12: 0000000000000000   r13: ffff976000363958   r14: ffffffffffffffff
(XEN) r15: ffff976000363958   cr0: 0000000080050033   cr4: 00000000000006b0
(XEN) cr3: 0000000001822000   cr2: 0000000000000000
(XEN) ds: 0028   es: 0028   fs: 0028   gs: 0028   ss: 0028   cs: 0020
(XEN) 
(XEN) *** Dumping CPU3 host state: ***
(XEN) ----[ Xen-3.3.1  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff828c8019a45c>] vmx_vmexit_handler+0x2ec/0x1b20
(XEN) RFLAGS: 0000000000000202   CONTEXT: hypervisor
(XEN) rax: 0000000000000027   rbx: ffff8300be6e4080   rcx: 0000000000000007
(XEN) rdx: ffff828c8021e3a0   rsi: ffff9700fe1a9b70   rdi: ffff8300be91ff28
(XEN) rbp: ffff9700ffb80998   rsp: ffff8300be91fe38   r8:  0000000000000000
(XEN) r9:  ffff9700ff41e074   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: 0000000000000007   r13: ffff8300be6e4080   r14: ffff8300be91ff28
(XEN) r15: ffff9700ff01f9c0   cr0: 000000008005003b   cr4: 00000000000026b0
(XEN) cr3: 000000033fc26000   cr2: 0000000000000000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff8300be91fe38:
(XEN)    ffff828c8021e100 ffff8300be6e4080 ffff8300be6e5858 ffff8300be6e4080
(XEN)    ffff9700ffb80998 ffff828c80187141 000000000000e102 000000000000e102
(XEN)    00000000000000e1 ffff828c8019483d ffff8300be6ee102 ffff828c80137d6e
(XEN)    ffff8300be91ff28 ffff8300be6e4080 0000000000000000 ffff8300be852088
(XEN)    000001f9889c1558 0000000000000003 ffff8300be91ff28 ffff828c8011b87a
(XEN)    ffff828c80276980 0000000000000003 ffff828c80277980 ffff8300be6e4080
(XEN)    ffff9700ffb80998 ffff9700ff0476fc ffff9700ff047700 ffff9700fe000000
(XEN)    ffff9700ff01f9c0 ffff828c801944c3 ffff9700ff01f9c0 ffff9700fe000000
(XEN)    ffff9700ff047700 ffff9700ff0476fc ffff9700ffb80998 00000000c0010001
(XEN)    0000000000000000 0000000000000000 ffff9700ff41e074 0000000000000000
(XEN)    ffff9700ff02e59a 0000000000000043 0000000000000043 ffff9700fe1a9b70
(XEN)    ffff9700ffb80998 000000f100000001 ffff9700ff02e5c9 0000000000000000
(XEN)    0000000000000282 ffff9700fe1a9b60 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000003
(XEN)    ffff8300be6e4080
(XEN) Xen call trace:
(XEN)    [<ffff828c8019a45c>] vmx_vmexit_handler+0x2ec/0x1b20
(XEN)    [<ffff828c80187141>] hvm_vcpu_has_pending_irq+0x41/0x60
(XEN)    [<ffff828c8019483d>] vmx_intr_assist+0x2bd/0x490
(XEN)    [<ffff828c80137d6e>] reprogram_timer+0x1e/0x90
(XEN)    [<ffff828c8011b87a>] _spin_unlock_irq+0x1a/0x40
(XEN)    [<ffff828c801944c3>] vmx_asm_do_vmentry+0x0/0xbd
(XEN)    
(XEN) *** Dumping CPU3 guest state (d1:v0): ***
(XEN) ----[ Xen-3.3.1  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    3
(XEN) RIP:    0020:[<ffff9700ff02e5c9>]
(XEN) RFLAGS: 0000000000000282   CONTEXT: hvm guest
(XEN) rax: ffff9700ff02e59a   rbx: 00000000c0010001   rcx: 0000000000000043
(XEN) rdx: 0000000000000043   rsi: ffff9700fe1a9b70   rdi: ffff9700ffb80998
(XEN) rbp: ffff9700ffb80998   rsp: ffff9700fe1a9b60   r8:  0000000000000000
(XEN) r9:  ffff9700ff41e074   r10: 0000000000000000   r11: 0000000000000000
(XEN) r12: ffff9700ff0476fc   r13: ffff9700ff047700   r14: ffff9700fe000000
(XEN) r15: ffff9700ff01f9c0   cr0: 0000000080050033   cr4: 00000000000006b0
(XEN) cr3: 0000000001423000   cr2: 0000000000000000
(XEN) ds: 0028   es: 0028   fs: 0028   gs: 0028   ss: 0028   cs: 0020
(XEN) 



-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Oct-21 13:28 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

On 21/10/2009 14:07, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com> wrote:
> I need some help in debugging a strange hypervisor behavior together
> with using fully virtualized performance counters.
> 
> For info I use SLES11, means xen-3.3.1 and linux-2.6.27.19-5... on a
> Intel nehalem machine.
> I tried the hypervisor from xen-unstable but the machine didn''t
boot.
That in itself is frankly more of a concern to me. Probably recent
irq-handling changes, or some other platform change, has broken boot on some
machines. If we don''t get reports and testing help with that,
it''ll end up
broken in the next major stable release too, which we really don''t
want.

Meanwhile, can you at least boot with 3.4? At least we still maintain that.
And do a debug build (debug=y make ...) so that the backtraces from the
''d''
debug key are more meaningful.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Oct-21 13:35 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

> On 21/10/2009 14:07, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com> wrote:
> 
> > I need some help in debugging a strange hypervisor behavior together
> > with using fully virtualized performance counters.
> > 
> > For info I use SLES11, means xen-3.3.1 and linux-2.6.27.19-5... on a
> > Intel nehalem machine.
> > I tried the hypervisor from xen-unstable but the machine
didn''t boot.
> 
> That in itself is frankly more of a concern to me. Probably recent
> irq-handling changes, or some other platform change, has broken boot on
some
> machines. If we don''t get reports and testing help with that,
it''ll end up
> broken in the next major stable release too, which we really don''t
want.
> 
> Meanwhile, can you at least boot with 3.4? At least we still maintain that.
> And do a debug build (debug=y make ...) so that the backtraces from the
''d''
> debug key are more meaningful.
> 
>  -- Keir
Yes, you are right,  I''ll try 3.4.
Thanks.
Dietmar.
> 
> 
> 
> -- 
Dietmar Hahn
TSP ES&S SWE OS                                Telephone: +49 (0) 89 636
40274
Fujitsu Technology Solutions                Email: dietmar.hahn@ts.fujitsu.com
Otto-Hahn-Ring 6                              Internet:  http://ts.fujitsu.com
D-81739 München                    Company details:ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Oct-21 13:53 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

On 21/10/2009 14:35, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com> wrote:
>> That in itself is frankly more of a concern to me. Probably recent
>> irq-handling changes, or some other platform change, has broken boot on
some
>> machines. If we don''t get reports and testing help with that,
it''ll end up
>> broken in the next major stable release too, which we really
don''t want.
>> 
>> Meanwhile, can you at least boot with 3.4? At least we still maintain
that.
>> And do a debug build (debug=y make ...) so that the backtraces from the
''d''
>> debug key are more meaningful.
>> 
>>  -- Keir
> 
> Yes, you are right,  I''ll try 3.4.
Thanks. DomU guests taking out the host is an embarrassing class of bug. It
would be good to get this sorted for 3.4.2 if the bug still exists. Or worst
case we could make this perfctr stuff a default-off config option. ;-)

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Oct-22 06:23 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Am 21.10.2009 schrieb Keir Keir Fraser:> On 21/10/2009 14:07, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com> wrote:
> 
> > I need some help in debugging a strange hypervisor behavior together
> > with using fully virtualized performance counters.
> > 
> > For info I use SLES11, means xen-3.3.1 and linux-2.6.27.19-5... on a
> > Intel nehalem machine.
> > I tried the hypervisor from xen-unstable but the machine
didn''t boot.
> 
> That in itself is frankly more of a concern to me. Probably recent
> irq-handling changes, or some other platform change, has broken boot on
some
> machines. If we don''t get reports and testing help with that,
it''ll end up
> broken in the next major stable release too, which we really don''t
want.
> 
> Meanwhile, can you at least boot with 3.4? At least we still maintain that.
> And do a debug build (debug=y make ...) so that the backtraces from the
''d''
> debug key are more meaningful.
> 
>  -- Keir
OK, I tried xen-3.4-testing.hg and the system booted fine ;-)
Then I did a fresh hg pull from xen-unstable and the boot stopped in
the linux kernel.
Attached are the loggings from the serial console for both hypervisors.
The tests with the performance counters needs more time for some preparations.
Thanks.
Dietmar.

-- 
Company details: http://ts.fujitsu.com/imprint.html



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Oct-22 06:39 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

On 22/10/2009 07:23, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com> wrote:
> OK, I tried xen-3.4-testing.hg and the system booted fine ;-)
> Then I did a fresh hg pull from xen-unstable and the boot stopped in
> the linux kernel.
> Attached are the loggings from the serial console for both hypervisors.
Okay, so output just dies early during dom0 boot. I guess if you try the
''d''
debug key that you get no output from that either (CTRL-a three times
followed by d)?

Does xen-unstable work on other machines with that dom0 kernel, do you know?
It''s not at this point clear whether the issue is related to the
hardware or
the particular dom0 kernel.

If you haven''t seen that dom0 kernel work with xen-unstable on any
system,
can I get that dom0 kernel from somewhere to give it a go? Perhaps your
exact dom0 kernel binary to start with, to make things as close as possibel
to your setup?

 Thanks,
 Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Oct-22 07:21 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

> On 22/10/2009 07:23, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com> wrote:
> 
> > OK, I tried xen-3.4-testing.hg and the system booted fine ;-)
> > Then I did a fresh hg pull from xen-unstable and the boot stopped in
> > the linux kernel.
> > Attached are the loggings from the serial console for both
hypervisors.
> 
> Okay, so output just dies early during dom0 boot. I guess if you try the
''d''
> debug key that you get no output from that either (CTRL-a three times
> followed by d)?
Sorry, CTRL-a doesn''t work.
> 
> Does xen-unstable work on other machines with that dom0 kernel, do you
know?
> It''s not at this point clear whether the issue is related to the
hardware or
> the particular dom0 kernel.
Yes it works on older machines, I can send you the log.
> 
> If you haven''t seen that dom0 kernel work with xen-unstable on any
system,
> can I get that dom0 kernel from somewhere to give it a go? Perhaps your
> exact dom0 kernel binary to start with, to make things as close as possibel
> to your setup?
If needed I can put the kernel on an outgoing ftp server.

Dietmar.

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Oct-30 12:20 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Hi,
 > I need some help in debugging a strange hypervisor behavior together
> with using fully virtualized performance counters.
> 
I added some own tracer to xentrace to find, what the CPU is doing.
No I can see, that in the strange case the CPU is doing endless (and nothing
else!) performance counter NMI''s within the hypervisor.

pmu_apic_interrupt
  smp_pmu_apic_interrupt
    vmx_do_pmu_interrupt
      vpmu_do_interrupt

In the normal case in core2_vpmu_do_interrupt:
            1. Read the cause of the nmi
        rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);
        ...
            2. Save the value for the domU
        ...
            3. Reset the cause
        wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
            4. Inject NMI in domU

This works very well for a short time.
Then the hypervisor falls in the endless nmi loop. The cause for this seems
to be that "3. Reset the cause" doesn''t work anymore. Means
writing to the
MSR_CORE_PERF_GLOBAL_OVF_CTRL doesn''t reset the
MSR_CORE_PERF_GLOBAL_STATUS
which leads to the next nmi immediately.
I found this by adding another tracer which reads the
MSR_CORE_PERF_GLOBAL_STATUS
once again after writing the MSR_CORE_PERF_GLOBAL_OVF_CTRL.
In the normal case this contains now 0, in the strange case the value is
unchanged!

I searched the intel processor spec but couldn''t find any help.
So my questions is, what is wrong here?
Can anybody with more knowledge point me in the right direction, what can I
still
do to find the real cause of this?

Many thanks in advance!
Dietmar.

-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Oct-30 13:06 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

On 30/10/2009 12:20, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com> wrote:
> I searched the intel processor spec but couldn''t find any help.
> So my questions is, what is wrong here?
> Can anybody with more knowledge point me in the right direction, what can I
> still
> do to find the real cause of this?
You should probably Cc one of the Intel guys who implemented this stuff --
I''ve added Haitao Shan.

Meanwhile I''d be interested to know whether things work okay for you,
minus
performance counters and the hypervisor hang, if you return immediately from
vpmu_initialise(). Really at minimum we need such a fix, perhaps with a boot
paremeter to re-enable the feature, for 3.4.2 release; allowing guests to
hose the hypervisor like this is of course not on.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Haitao Shan

2009-Nov-02 01:12 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Can I know how you enabled vPMU on Nehalem? This is not supported in
current Xen.

Concerning vpmu support, I totally agree that we can disable this
feature by default. If anyone really wants to use it, he can use boot
options to turn it on. I am preparing a patch for that. And I will
send a patch to enable NHM vpmu together.

For the problem that Dietmar met, I think I once met this before. Can
you add some code in vpmu_do_interrupt that sets the counter you are
using to a value other than zero? Please let me know if that can help.

Best Regards
Shan Haitao

2009/10/30 Keir Fraser
<keir.fraser@eu.citrix.com>:> On 30/10/2009 12:20, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com> wrote:
>
>> I searched the intel processor spec but couldn''t find any
help.
>> So my questions is, what is wrong here?
>> Can anybody with more knowledge point me in the right direction, what
can I
>> still
>> do to find the real cause of this?
>
> You should probably Cc one of the Intel guys who implemented this stuff --
> I''ve added Haitao Shan.
>
> Meanwhile I''d be interested to know whether things work okay for
you, minus
> performance counters and the hypervisor hang, if you return immediately
from
> vpmu_initialise(). Really at minimum we need such a fix, perhaps with a
boot
> paremeter to re-enable the feature, for 3.4.2 release; allowing guests to
> hose the hypervisor like this is of course not on.
>
>  -- Keir
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Nov-02 09:11 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Hi Haitao,
> Can I know how you enabled vPMU on Nehalem? This is not supported in
> current Xen.
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> 
> Concerning vpmu support, I totally agree that we can disable this
> feature by default. If anyone really wants to use it, he can use boot
> options to turn it on.
Yes, that''s OK for me.
> I am preparing a patch for that. And I will
> send a patch to enable NHM vpmu together.
> 
> For the problem that Dietmar met, I think I once met this before. Can
> you add some code in vpmu_do_interrupt that sets the counter you are
> using to a value other than zero? Please let me know if that can help.
I don''t set the counter to zero. I use 0-val to set the counter.
Actually I testet on Nehalem with
- General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000
- Fixed counter #1 (0x30a) and val=1100000
The thing is that in normal case the overflows of both counters appear
nearly at the same time.
As described I added some extra tracer for xentrace in
core2_vpmu_do_interrupt() so the code looks like:

    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1. Step
	{
		uint32_t HAHN_l, HAHN_h;
		HAHN_l = (uint32_t) msr_content;
		HAHN_h = (uint32_t) (msr_content >> 32);
		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. Step
	}
    if ( !msr_content )
        return 0;
    core2_vpmu_cxt->global_ovf_status |= msr_content;
    msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1);
    wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3. Step

    rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4. Step
	{
        uint32_t HAHN_l, HAHN_h;
        HAHN_l = (uint32_t) msr_content;
        HAHN_h = (uint32_t) (msr_content >> 32);
        HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    -> 5. Step

        rdmsrl(0xc3, msr_content);                        -> 6. Step General
counter #2
        HAHN_l = (uint32_t) msr_content;
        HAHN_h = (uint32_t) (msr_content >> 32);
        HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
        rdmsrl(0x30a, msr_content);                       -> 7. Step Fixed
counter #1
        HAHN_l = (uint32_t) msr_content;
        HAHN_h = (uint32_t) (msr_content >> 32);
        HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);
	}

With these tracers I got the following output:

Last good NMI:
Both counter cause the NMI. Resetting works OK.
The counter itself were running further.
2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ] 
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ]  rdmsrl(0xc3)  -> #2
general counter
7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ]  rdmsrl(0x30a) -> #1
fixed counter

NMI from where things goes wrong:
Both counter cause the NMI. Resetting works NOT correct, only for the
general counter!
The general counter (caused the NMI) seems to be stopped!
2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3)  -> #2
general counter
7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a) -> #1
fixed counter

Wrong NMI:
Only the fixed counter causes the NMI (which was not resetted during NMI
handling above!)
Both counter seems to be stopped!
2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ] 
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3)  -> #2
general counter
7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a) -> #1
fixed counter

And this state remains forever!
I hope my explanations are understandable ;-)

Until now I can see this behavior only on a Nehalem processor.

Thanks.
Dietmar
> 
> Best Regards
> Shan Haitao
> 
> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>:
> > On 30/10/2009 12:20, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com> wrote:
> >
> >> I searched the intel processor spec but couldn''t find any
help.
> >> So my questions is, what is wrong here?
> >> Can anybody with more knowledge point me in the right direction,
what can I
> >> still
> >> do to find the real cause of this?
> >
> > You should probably Cc one of the Intel guys who implemented this
stuff --
> > I''ve added Haitao Shan.
> >
> > Meanwhile I''d be interested to know whether things work okay
for you, minus
> > performance counters and the hypervisor hang, if you return
immediately from
> > vpmu_initialise(). Really at minimum we need such a fix, perhaps with
a boot
> > paremeter to re-enable the feature, for 3.4.2 release; allowing guests
to
> > hose the hypervisor like this is of course not on.
> >
> >  -- Keir
> >
-- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Shan, Haitao

2009-Nov-02 09:49 UTC

head link

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

Very detailed explanation indeed. What you described is the same as I saw months
ago.
But unluckily, I do not know the root cause yet. It seems to me that unmasking
of PMI in local APIC will immediately generate a new NMI in the system if one of
the enabled counter is zero at that time.
That is why I was asking you whether you could try to set that counter to some
value other than zero (for example, 0x1) before unmasking(in your case, it is
Fixed Counter 1 0x30a) PMI in vpmu_do_interrupt and see whether it helped.

When I met this problem, I remember that I tried two approaches:
1> Setting the counter to non-zero before unmasking PMI in vpmu_do_interrupt;
2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical PMI* when
guest vcpu unmasks virtual PMI.
I remember that approach 2 can fix this issue. But I do not remember the result
of approach 1, since I met this about one year ago.
It is my understanding that approach 2 is quite same as approach 1, since
normally guest will set the counter to some negative value (for example,
-100000) before unmasking virtual PMI.
However, approach 2 looks cleaner and more reasonable.

Can you have a try and let me know the result? If both can not work, there might
be some problems that I have not met before.

BTW: Sorry, I did not see your patch to enable NHM vpmu before. So, there is no
need for me to work on that now. :)

Haitao


Dietmar Hahn wrote:> Hi Haitao,
> 
>> Can I know how you enabled vPMU on Nehalem? This is not supported in
>> current Xen.
> 
> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> 
>> 
>> Concerning vpmu support, I totally agree that we can disable this
>> feature by default. If anyone really wants to use it, he can use boot
>> options to turn it on.
> 
> Yes, that''s OK for me.
> 
>> I am preparing a patch for that. And I will
>> send a patch to enable NHM vpmu together.
>> 
>> For the problem that Dietmar met, I think I once met this before. Can
>> you add some code in vpmu_do_interrupt that sets the counter you are
>> using to a value other than zero? Please let me know if that can
>> help. 
> 
> I don''t set the counter to zero. I use 0-val to set the counter.
> Actually I testet on Nehalem with
> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000
> - Fixed counter #1 (0x30a) and val=1100000
> The thing is that in normal case the overflows of both counters appear
> nearly at the same time.
> As described I added some extra tracer for xentrace in
> core2_vpmu_do_interrupt() so the code looks like:
> 
>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1. Step
> 	{
> 		uint32_t HAHN_l, HAHN_h;
> 		HAHN_l = (uint32_t) msr_content;
> 		HAHN_h = (uint32_t) (msr_content >> 32);
> 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. Step
> 	}
>     if ( !msr_content )
>         return 0;
>     core2_vpmu_cxt->global_ovf_status |= msr_content;
>     msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count())
>     - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3.
> Step 
> 
>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4. Step
> 	{
>         uint32_t HAHN_l, HAHN_h;
>         HAHN_l = (uint32_t) msr_content;
>         HAHN_h = (uint32_t) (msr_content >> 32);
>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    -> 5. Step
> 
>         rdmsrl(0xc3, msr_content);                        -> 6. Step
>         General counter #2 HAHN_l = (uint32_t) msr_content;
>         HAHN_h = (uint32_t) (msr_content >> 32);
>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
>         rdmsrl(0x30a, msr_content);                       -> 7. Step
>         Fixed counter #1 HAHN_l = (uint32_t) msr_content;
>         HAHN_h = (uint32_t) (msr_content >> 32);
>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);
> 	}
> 
> With these tracers I got the following output:
> 
> Last good NMI:
> Both counter cause the NMI. Resetting works OK.
> The counter itself were running further.
> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ] 
> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ]  rdmsrl(0xc3) 
> -> #2 general counter 
> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ]  rdmsrl(0x30a)
> -> #1 fixed counter 
> 
> NMI from where things goes wrong:
> Both counter cause the NMI. Resetting works NOT correct, only for the
> general counter!
> The general counter (caused the NMI) seems to be stopped!
> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3) 
> -> #2 general counter 
> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a)
> -> #1 fixed counter 
> 
> Wrong NMI:
> Only the fixed counter causes the NMI (which was not resetted during
> NMI handling above!) Both counter seems to be stopped!
> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ] 
> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3) 
> -> #2 general counter 
> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a)
> -> #1 fixed counter 
> 
> And this state remains forever!
> I hope my explanations are understandable ;-)
> 
> Until now I can see this behavior only on a Nehalem processor.
> 
> Thanks.
> Dietmar
> 
>> 
>> Best Regards
>> Shan Haitao
>> 
>> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>:
>>> On 30/10/2009 12:20, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com>
>>> wrote: 
>>> 
>>>> I searched the intel processor spec but couldn''t find
any help.
>>>> So my questions is, what is wrong here?
>>>> Can anybody with more knowledge point me in the right
direction,
>>>> what can I still do to find the real cause of this?
>>> 
>>> You should probably Cc one of the Intel guys who implemented this
>>> stuff -- I''ve added Haitao Shan. 
>>> 
>>> Meanwhile I''d be interested to know whether things work
okay for
>>> you, minus performance counters and the hypervisor hang, if you
>>> return immediately from vpmu_initialise(). Really at minimum we
>>> need such a fix, perhaps with a boot paremeter to re-enable the
>>> feature, for 3.4.2 release; allowing guests to hose the hypervisor
>>> like this is of course not on. 
>>> 
>>>  -- Keir_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Nov-02 10:30 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

> Very detailed explanation indeed. What you described is the same as I saw
months ago.
> But unluckily, I do not know the root cause yet. It seems to me that
unmasking of PMI in local APIC will immediately generate a new NMI in the system
if one of the enabled counter is zero at that time.
> That is why I was asking you whether you could try to set that counter to
some value other than zero (for example, 0x1) before unmasking(in your case, it
is Fixed Counter 1 0x30a) PMI in vpmu_do_interrupt and see whether it helped.
OK I will try to set the counter after reading the 0 value to 1.
But some things remain fully unclear ...

Dietmar.
> 
> When I met this problem, I remember that I tried two approaches:
> 1> Setting the counter to non-zero before unmasking PMI in
vpmu_do_interrupt;
> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical PMI*
when guest vcpu unmasks virtual PMI.
> I remember that approach 2 can fix this issue. But I do not remember the
result of approach 1, since I met this about one year ago.
> It is my understanding that approach 2 is quite same as approach 1, since
normally guest will set the counter to some negative value (for example,
-100000) before unmasking virtual PMI.
> However, approach 2 looks cleaner and more reasonable.
> 
> Can you have a try and let me know the result? If both can not work, there
might be some problems that I have not met before.
> 
> BTW: Sorry, I did not see your patch to enable NHM vpmu before. So, there
is no need for me to work on that now. :)
> 
> Haitao
> 
> 
> Dietmar Hahn wrote:
> > Hi Haitao,
> > 
> >> Can I know how you enabled vPMU on Nehalem? This is not supported
in
> >> current Xen.
> > 
> >
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> > 
> >> 
> >> Concerning vpmu support, I totally agree that we can disable this
> >> feature by default. If anyone really wants to use it, he can use
boot
> >> options to turn it on.
> > 
> > Yes, that''s OK for me.
> > 
> >> I am preparing a patch for that. And I will
> >> send a patch to enable NHM vpmu together.
> >> 
> >> For the problem that Dietmar met, I think I once met this before.
Can
> >> you add some code in vpmu_do_interrupt that sets the counter you
are
> >> using to a value other than zero? Please let me know if that can
> >> help. 
> > 
> > I don''t set the counter to zero. I use 0-val to set the
counter.
> > Actually I testet on Nehalem with
> > - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000
> > - Fixed counter #1 (0x30a) and val=1100000
> > The thing is that in normal case the overflows of both counters appear
> > nearly at the same time.
> > As described I added some extra tracer for xentrace in
> > core2_vpmu_do_interrupt() so the code looks like:
> > 
> >     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 1.
Step
> > 	{
> > 		uint32_t HAHN_l, HAHN_h;
> > 		HAHN_l = (uint32_t) msr_content;
> > 		HAHN_h = (uint32_t) (msr_content >> 32);
> > 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. Step
> > 	}
> >     if ( !msr_content )
> >         return 0;
> >     core2_vpmu_cxt->global_ovf_status |= msr_content;
> >     msr_content = 0xC000000700000000 | ((1 <<
core2_get_pmc_count())
> >     - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   ->
3.
> > Step 
> > 
> >     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     -> 4.
Step
> > 	{
> >         uint32_t HAHN_l, HAHN_h;
> >         HAHN_l = (uint32_t) msr_content;
> >         HAHN_h = (uint32_t) (msr_content >> 32);
> >         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    -> 5.
Step
> > 
> >         rdmsrl(0xc3, msr_content);                        -> 6.
Step
> >         General counter #2 HAHN_l = (uint32_t) msr_content;
> >         HAHN_h = (uint32_t) (msr_content >> 32);
> >         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
> >         rdmsrl(0x30a, msr_content);                       -> 7.
Step
> >         Fixed counter #1 HAHN_l = (uint32_t) msr_content;
> >         HAHN_h = (uint32_t) (msr_content >> 32);
> >         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);
> > 	}
> > 
> > With these tracers I got the following output:
> > 
> > Last good NMI:
> > Both counter cause the NMI. Resetting works OK.
> > The counter itself were running further.
> > 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
> > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ] 
> > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ]  rdmsrl(0xc3) 
> > -> #2 general counter 
> > 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ]  rdmsrl(0x30a)
> > -> #1 fixed counter 
> > 
> > NMI from where things goes wrong:
> > Both counter cause the NMI. Resetting works NOT correct, only for the
> > general counter!
> > The general counter (caused the NMI) seems to be stopped!
> > 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
> > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
> > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3) 
> > -> #2 general counter 
> > 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a)
> > -> #1 fixed counter 
> > 
> > Wrong NMI:
> > Only the fixed counter causes the NMI (which was not resetted during
> > NMI handling above!) Both counter seems to be stopped!
> > 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ] 
> > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
> > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]  rdmsrl(0xc3) 
> > -> #2 general counter 
> > 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]  rdmsrl(0x30a)
> > -> #1 fixed counter 
> > 
> > And this state remains forever!
> > I hope my explanations are understandable ;-)
> > 
> > Until now I can see this behavior only on a Nehalem processor.
> > 
> > Thanks.
> > Dietmar
> > 
> >> 
> >> Best Regards
> >> Shan Haitao
> >> 
> >> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>:
> >>> On 30/10/2009 12:20, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com>
> >>> wrote: 
> >>> 
> >>>> I searched the intel processor spec but couldn''t
find any help.
> >>>> So my questions is, what is wrong here?
> >>>> Can anybody with more knowledge point me in the right
direction,
> >>>> what can I still do to find the real cause of this?
> >>> 
> >>> You should probably Cc one of the Intel guys who implemented
this
> >>> stuff -- I''ve added Haitao Shan. 
> >>> 
> >>> Meanwhile I''d be interested to know whether things
work okay for
> >>> you, minus performance counters and the hypervisor hang, if
you
> >>> return immediately from vpmu_initialise(). Really at minimum
we
> >>> need such a fix, perhaps with a boot paremeter to re-enable
the
> >>> feature, for 3.4.2 release; allowing guests to hose the
hypervisor
> >>> like this is of course not on. 
> >>> 
> >>>  -- Keir
> -- 
Dietmar Hahn
TSP ES&S SWE OS                                Telephone: +49 (0) 89 636
40274
Fujitsu Technology Solutions                Email: dietmar.hahn@ts.fujitsu.com
Otto-Hahn-Ring 6                              Internet:  http://ts.fujitsu.com
D-81739 München                    Company details:ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Nov-03 06:53 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

> 
> > Very detailed explanation indeed. What you described is the same as I
saw months ago.
> > But unluckily, I do not know the root cause yet. It seems to me that
unmasking of PMI in local APIC will immediately generate a new NMI in the system
if one of the enabled counter is zero at that time.
> > That is why I was asking you whether you could try to set that counter
to some value other than zero (for example, 0x1) before unmasking(in your case,
it is Fixed Counter 1 0x30a) PMI in vpmu_do_interrupt and see whether it helped.
> 
> OK I will try to set the counter after reading the 0 value to 1.
> But some things remain fully unclear ...
Hi Haitao,
> 1> Setting the counter to non-zero before unmasking PMI in
vpmu_do_interrupt;
I tried your first approach.

1. I added

   rdmsrl(CounterX, msr_content)
   if (msr_content == 0)
   {
       HVMTRACE_3D(HAHN_TR2, ...);     // A tracer to see this.
       wrmsrl(ConterX, 0x1)
   }

   directly behind the line of reading the MSR_CORE_PERF_GLOBAL_STATUS.
   In the xentrace output I found some tracers where counters were zero
   but I couldn''t reproduce the hanging behavior!

   The interesting thing here was, that MSR_CORE_PERF_GLOBAL_STATUS
   contained always zero (4. Step) after resetting it with writing
   MSR_CORE_PERF_GLOBAL_OVF_CTRL (3. Step).
   This was differently seen in my first mail!

2. I added the code above behind the second read (for test) of
   MSR_CORE_PERF_GLOBAL_STATUS (around 6. and 7. Step).
   Now I could see some of these tracers but no hanging behavior!
   In this case I could see the same behavior of the
   MSR_CORE_PERF_GLOBAL_STATUS like in my first mail.

The conclusion is, that this seems to be a workaround for the endless
NMI loop. PMI''s are a very rarely event and this should not raise a
performance
problem.

I didn''t try your second approach> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical PMI*
when guest vcpu unmasks virtual PMI.but I have some question.

- What if the ''physical PMI'' is not unmasked in
vpmu_do_interrupt and a watchdog NMI would
  occur before the domU unmasks it?
- Is it possible that after handling the NMI (and not unmasking) another
  domU got running on this CPU and therefore PMI''s got lost?

But the real cause of the problem is unknown. As said I saw this only on
Nehalem. Maybe there is a problem together with the hardware? Perhaps your
hardware colleagues know something more ;-)

Thanks
Dietmar
> 
> > 
> > When I met this problem, I remember that I tried two approaches:
> > 1> Setting the counter to non-zero before unmasking PMI in
vpmu_do_interrupt;
> > 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical
PMI* when guest vcpu unmasks virtual PMI.
> > I remember that approach 2 can fix this issue. But I do not remember
the result of approach 1, since I met this about one year ago.
> > It is my understanding that approach 2 is quite same as approach 1,
since normally guest will set the counter to some negative value (for example,
-100000) before unmasking virtual PMI.
> > However, approach 2 looks cleaner and more reasonable.
> > 
> > Can you have a try and let me know the result? If both can not work,
there might be some problems that I have not met before.
> > 
> > BTW: Sorry, I did not see your patch to enable NHM vpmu before. So,
there is no need for me to work on that now. :)
> > 
> > Haitao
> > 
> > 
> > Dietmar Hahn wrote:
> > > Hi Haitao,
> > > 
> > >> Can I know how you enabled vPMU on Nehalem? This is not
supported in
> > >> current Xen.
> > > 
> > >
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> > > 
> > >> 
> > >> Concerning vpmu support, I totally agree that we can disable
this
> > >> feature by default. If anyone really wants to use it, he can
use boot
> > >> options to turn it on.
> > > 
> > > Yes, that''s OK for me.
> > > 
> > >> I am preparing a patch for that. And I will
> > >> send a patch to enable NHM vpmu together.
> > >> 
> > >> For the problem that Dietmar met, I think I once met this
before. Can
> > >> you add some code in vpmu_do_interrupt that sets the counter
you are
> > >> using to a value other than zero? Please let me know if that
can
> > >> help. 
> > > 
> > > I don''t set the counter to zero. I use 0-val to set the
counter.
> > > Actually I testet on Nehalem with
> > > - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and
val=1100000
> > > - Fixed counter #1 (0x30a) and val=1100000
> > > The thing is that in normal case the overflows of both counters
appear
> > > nearly at the same time.
> > > As described I added some extra tracer for xentrace in
> > > core2_vpmu_do_interrupt() so the code looks like:
> > > 
> > >     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     ->
1. Step
> > > 	{
> > > 		uint32_t HAHN_l, HAHN_h;
> > > 		HAHN_l = (uint32_t) msr_content;
> > > 		HAHN_h = (uint32_t) (msr_content >> 32);
> > > 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2. Step
> > > 	}
> > >     if ( !msr_content )
> > >         return 0;
> > >     core2_vpmu_cxt->global_ovf_status |= msr_content;
> > >     msr_content = 0xC000000700000000 | ((1 <<
core2_get_pmc_count())
> > >     - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);  
-> 3.
> > > Step 
> > > 
> > >     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     ->
4. Step
> > > 	{
> > >         uint32_t HAHN_l, HAHN_h;
> > >         HAHN_l = (uint32_t) msr_content;
> > >         HAHN_h = (uint32_t) (msr_content >> 32);
> > >         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    ->
5. Step
> > > 
> > >         rdmsrl(0xc3, msr_content);                        ->
6. Step
> > >         General counter #2 HAHN_l = (uint32_t) msr_content;
> > >         HAHN_h = (uint32_t) (msr_content >> 32);
> > >         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
> > >         rdmsrl(0x30a, msr_content);                       ->
7. Step
> > >         Fixed counter #1 HAHN_l = (uint32_t) msr_content;
> > >         HAHN_h = (uint32_t) (msr_content >> 32);
> > >         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l);
> > > 	}
> > > 
> > > With these tracers I got the following output:
> > > 
> > > Last good NMI:
> > > Both counter cause the NMI. Resetting works OK.
> > > The counter itself were running further.
> > > 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ] 
rdmsrl(0xc3)
> > > -> #2 general counter 
> > > 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ] 
rdmsrl(0x30a)
> > > -> #1 fixed counter 
> > > 
> > > NMI from where things goes wrong:
> > > Both counter cause the NMI. Resetting works NOT correct, only for
the
> > > general counter!
> > > The general counter (caused the NMI) seems to be stopped!
> > > 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
rdmsrl(0xc3)
> > > -> #2 general counter 
> > > 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
rdmsrl(0x30a)
> > > -> #1 fixed counter 
> > > 
> > > Wrong NMI:
> > > Only the fixed counter causes the NMI (which was not resetted
during
> > > NMI handling above!) Both counter seems to be stopped!
> > > 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ] 
> > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 
> > > 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
rdmsrl(0xc3)
> > > -> #2 general counter 
> > > 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
rdmsrl(0x30a)
> > > -> #1 fixed counter 
> > > 
> > > And this state remains forever!
> > > I hope my explanations are understandable ;-)
> > > 
> > > Until now I can see this behavior only on a Nehalem processor.
> > > 
> > > Thanks.
> > > Dietmar
> > > 
> > >> 
> > >> Best Regards
> > >> Shan Haitao
> > >> 
> > >> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>:
> > >>> On 30/10/2009 12:20, "Dietmar Hahn"
<dietmar.hahn@ts.fujitsu.com>
> > >>> wrote: 
> > >>> 
> > >>>> I searched the intel processor spec but
couldn''t find any help.
> > >>>> So my questions is, what is wrong here?
> > >>>> Can anybody with more knowledge point me in the right
direction,
> > >>>> what can I still do to find the real cause of this?
> > >>> 
> > >>> You should probably Cc one of the Intel guys who
implemented this
> > >>> stuff -- I''ve added Haitao Shan. 
> > >>> 
> > >>> Meanwhile I''d be interested to know whether
things work okay for
> > >>> you, minus performance counters and the hypervisor hang,
if you
> > >>> return immediately from vpmu_initialise(). Really at
minimum we
> > >>> need such a fix, perhaps with a boot paremeter to
re-enable the
> > >>> feature, for 3.4.2 release; allowing guests to hose the
hypervisor
> > >>> like this is of course not on. 
> > >>> 
> > >>>  -- Keir
> > 
> -- 
Dietmar Hahn
TSP ES&S SWE OS                                Telephone: +49 (0) 89 636
40274
Fujitsu Technology Solutions                Email: dietmar.hahn@ts.fujitsu.com
Otto-Hahn-Ring 6                              Internet:  http://ts.fujitsu.com
D-81739 München                    Company details:ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Shan, Haitao

2009-Nov-03 07:32 UTC

head link

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

See my comments embedded. :)

Haitao


Dietmar Hahn wrote:> The conclusion is, that this seems to be a workaround for the endless
> NMI loop. PMI''s are a very rarely event and this should not raise
a
> performance 
> problem.I totally agree that this is only a workaround for approach 1.
> 
> I didn''t try your second approach
>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical
>> PMI* when guest vcpu unmasks virtual PMI. but I have some question. 
> 
> - What if the ''physical PMI'' is not unmasked in
vpmu_do_interrupt and
>   a watchdog NMI would occur before the domU unmasks it?I think the second NMI will be lost.
> - Is it possible that after handling the NMI (and not unmasking)
>   another domU got running on this CPU and therefore PMI''s got
lost?LVTPC entry in physical local APIC is save/restored by Xen on VCPU switches. So
unmasking (or not) of PMI of one vcpu should have no impact on another vcpu.
When developing vPMU, I treated as vPMU context both PMU MSRs and LVTPC entry in
local APIC. vPMU context is save/restored on physical HW when vcpus is
scheduled, either in an active save/restore manner or a lazy one (depending on
the PMU usage at the time of switch).
> 
> But the real cause of the problem is unknown. As said I saw this only
> on 
> Nehalem. Maybe there is a problem together with the hardware? Perhaps
> your 
> hardware colleagues know something more ;-)When I found this problem, I just thought it might be a corner case that only
happens on my box (of course, I only see this in NHM, too).
I will try to pin HW guy to see if any explanation, since it is proven to be a
general problem on NHM.

But before everything is clear, I think approach 2 is a better solution now.
> 
> Thanks
> Dietmar
> 
>> 
>>> 
>>> When I met this problem, I remember that I tried two approaches:
>>> 1> Setting the counter to non-zero before unmasking PMI in
>>> vpmu_do_interrupt; 2> Remove unmasking PMI from
vpmu_do_interrupt
>>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI. 
>>> I remember that approach 2 can fix this issue. But I do not
>>> remember the result of approach 1, since I met this about one year
>>> ago.  
>>> It is my understanding that approach 2 is quite same as approach 1,
>>> since normally guest will set the counter to some negative value
>>> (for example, -100000) before unmasking virtual PMI.  
>>> However, approach 2 looks cleaner and more reasonable.
>>> 
>>> Can you have a try and let me know the result? If both can not
>>> work, there might be some problems that I have not met before. 
>>> 
>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. So,
>>> there is no need for me to work on that now. :) 
>>> 
>>> Haitao
>>> 
>>> 
>>> Dietmar Hahn wrote:
>>>> Hi Haitao,
>>>> 
>>>>> Can I know how you enabled vPMU on Nehalem? This is not
supported
>>>>> in current Xen.
>>>> 
>>>>
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
>>>> 
>>>>> 
>>>>> Concerning vpmu support, I totally agree that we can
disable this
>>>>> feature by default. If anyone really wants to use it, he
can use
>>>>> boot options to turn it on.
>>>> 
>>>> Yes, that''s OK for me.
>>>> 
>>>>> I am preparing a patch for that. And I will
>>>>> send a patch to enable NHM vpmu together.
>>>>> 
>>>>> For the problem that Dietmar met, I think I once met this
before.
>>>>> Can you add some code in vpmu_do_interrupt that sets the
counter
>>>>> you are using to a value other than zero? Please let me
know if
>>>>> that can help.
>>>> 
>>>> I don''t set the counter to zero. I use 0-val to set
the counter.
>>>> Actually I testet on Nehalem with
>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and
>>>> val=1100000 
>>>> - Fixed counter #1 (0x30a) and val=1100000
>>>> The thing is that in normal case the overflows of both counters
>>>> appear nearly at the same time. As described I added some extra
>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code
looks
>>>> like: 
>>>> 
>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     ->
1.
>>>> 		Step 	{ uint32_t HAHN_l, HAHN_h;
>>>> 		HAHN_l = (uint32_t) msr_content;
>>>> 		HAHN_h = (uint32_t) (msr_content >> 32);
>>>> 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      -> 2.
Step 	}
>>>>     if ( !msr_content )
>>>>         return 0;
>>>>     core2_vpmu_cxt->global_ovf_status |= msr_content;
>>>>     msr_content = 0xC000000700000000 | ((1 <<
>>>>     core2_get_pmc_count()) - 1);
>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);   -> 3.
Step
>>>> 
>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);     ->
4.
>>>>         Step 	{ uint32_t HAHN_l, HAHN_h;
>>>>         HAHN_l = (uint32_t) msr_content;
>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);    ->
5.
>>>> Step 
>>>> 
>>>>         rdmsrl(0xc3, msr_content);                        ->
6.
>>>>         Step General counter #2 HAHN_l = (uint32_t)
msr_content;
>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
>>>>         rdmsrl(0x30a, msr_content);                       ->
7.
>>>>         Step Fixed counter #1 HAHN_l = (uint32_t) msr_content;
>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); 	}
>>>> 
>>>> With these tracers I got the following output:
>>>> 
>>>> Last good NMI:
>>>> Both counter cause the NMI. Resetting works OK.
>>>> The counter itself were running further.
>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ] 
>>>> rdmsrl(0xc3) -> #2 general counter 
>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ] 
>>>> rdmsrl(0x30a) -> #1 fixed counter 
>>>> 
>>>> NMI from where things goes wrong:
>>>> Both counter cause the NMI. Resetting works NOT correct, only
for
>>>> the general counter! The general counter (caused the NMI) seems
to
>>>> be stopped! 
>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
>>>> rdmsrl(0xc3) -> #2 general counter 
>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
>>>> rdmsrl(0x30a) -> #1 fixed counter 
>>>> 
>>>> Wrong NMI:
>>>> Only the fixed counter causes the NMI (which was not resetted
>>>> during NMI handling above!) Both counter seems to be stopped!
>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
>>>> rdmsrl(0xc3) -> #2 general counter 
>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
>>>> rdmsrl(0x30a) -> #1 fixed counter 
>>>> 
>>>> And this state remains forever!
>>>> I hope my explanations are understandable ;-)
>>>> 
>>>> Until now I can see this behavior only on a Nehalem processor.
>>>> 
>>>> Thanks.
>>>> Dietmar
>>>> 
>>>>> 
>>>>> Best Regards
>>>>> Shan Haitao
>>>>> 
>>>>> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>:
>>>>>> On 30/10/2009 12:20, "Dietmar Hahn"
>>>>>> <dietmar.hahn@ts.fujitsu.com> wrote: 
>>>>>> 
>>>>>>> I searched the intel processor spec but
couldn''t find any help.
>>>>>>> So my questions is, what is wrong here?
>>>>>>> Can anybody with more knowledge point me in the
right direction,
>>>>>>> what can I still do to find the real cause of this?
>>>>>> 
>>>>>> You should probably Cc one of the Intel guys who
implemented this
>>>>>> stuff -- I''ve added Haitao Shan.
>>>>>> 
>>>>>> Meanwhile I''d be interested to know whether
things work okay for
>>>>>> you, minus performance counters and the hypervisor
hang, if you
>>>>>> return immediately from vpmu_initialise(). Really at
minimum we
>>>>>> need such a fix, perhaps with a boot paremeter to
re-enable the
>>>>>> feature, for 3.4.2 release; allowing guests to hose the
>>>>>> hypervisor like this is of course not on.
>>>>>> 
>>>>>>  -- Keir_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Nov-03 07:52 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Please see below.
> See my comments embedded. :)
> 
> Haitao
> 
> 
> Dietmar Hahn wrote:
> > The conclusion is, that this seems to be a workaround for the endless
> > NMI loop. PMI''s are a very rarely event and this should not
raise a
> > performance 
> > problem.
> I totally agree that this is only a workaround for approach 1.
> 
> > 
> > I didn''t try your second approach
> >> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask
*physical
> >> PMI* when guest vcpu unmasks virtual PMI. but I have some
question.
> > 
> > - What if the ''physical PMI'' is not unmasked in
vpmu_do_interrupt and
> >   a watchdog NMI would occur before the domU unmasks it?
> I think the second NMI will be lost.
> 
> > - Is it possible that after handling the NMI (and not unmasking)
> >   another domU got running on this CPU and therefore PMI''s
got lost?
> LVTPC entry in physical local APIC is save/restored by Xen on VCPU
switches. So unmasking (or not) of PMI of one vcpu should have no impact on
another vcpu. When developing vPMU, I treated as vPMU context both PMU MSRs and
LVTPC entry in local APIC. vPMU context is save/restored on physical HW when
vcpus is scheduled, either in an active save/restore manner or a lazy one
(depending on the PMU usage at the time of switch).
> 
> > 
> > But the real cause of the problem is unknown. As said I saw this only
> > on 
> > Nehalem. Maybe there is a problem together with the hardware? Perhaps
> > your 
> > hardware colleagues know something more ;-)
> When I found this problem, I just thought it might be a corner case that
only happens on my box (of course, I only see this in NHM, too).
> I will try to pin HW guy to see if any explanation, since it is proven to
be a general problem on NHM.
> 
> But before everything is clear, I think approach 2 is a better solution
now.
What would be the effect if the guest unmasks the PMI (which leads to unmasking
the ''physical PMI'')
but doesn''t reset the counter to a value != 0? Is the guest able to
produce the nmi endless loop?

Dietmar.
> 
> > 
> > Thanks
> > Dietmar
> > 
> >> 
> >>> 
> >>> When I met this problem, I remember that I tried two
approaches:
> >>> 1> Setting the counter to non-zero before unmasking PMI in
> >>> vpmu_do_interrupt; 2> Remove unmasking PMI from
vpmu_do_interrupt
> >>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI.
> >>> I remember that approach 2 can fix this issue. But I do not
> >>> remember the result of approach 1, since I met this about one
year
> >>> ago.  
> >>> It is my understanding that approach 2 is quite same as
approach 1,
> >>> since normally guest will set the counter to some negative
value
> >>> (for example, -100000) before unmasking virtual PMI.  
> >>> However, approach 2 looks cleaner and more reasonable.
> >>> 
> >>> Can you have a try and let me know the result? If both can not
> >>> work, there might be some problems that I have not met before.
> >>> 
> >>> BTW: Sorry, I did not see your patch to enable NHM vpmu
before. So,
> >>> there is no need for me to work on that now. :) 
> >>> 
> >>> Haitao
> >>> 
> >>> 
> >>> Dietmar Hahn wrote:
> >>>> Hi Haitao,
> >>>> 
> >>>>> Can I know how you enabled vPMU on Nehalem? This is
not supported
> >>>>> in current Xen.
> >>>> 
> >>>>
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> >>>> 
> >>>>> 
> >>>>> Concerning vpmu support, I totally agree that we can
disable this
> >>>>> feature by default. If anyone really wants to use it,
he can use
> >>>>> boot options to turn it on.
> >>>> 
> >>>> Yes, that''s OK for me.
> >>>> 
> >>>>> I am preparing a patch for that. And I will
> >>>>> send a patch to enable NHM vpmu together.
> >>>>> 
> >>>>> For the problem that Dietmar met, I think I once met
this before.
> >>>>> Can you add some code in vpmu_do_interrupt that sets
the counter
> >>>>> you are using to a value other than zero? Please let
me know if
> >>>>> that can help.
> >>>> 
> >>>> I don''t set the counter to zero. I use 0-val to
set the counter.
> >>>> Actually I testet on Nehalem with
> >>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and
> >>>> val=1100000 
> >>>> - Fixed counter #1 (0x30a) and val=1100000
> >>>> The thing is that in normal case the overflows of both
counters
> >>>> appear nearly at the same time. As described I added some
extra
> >>>> tracer for xentrace in core2_vpmu_do_interrupt() so the
code looks
> >>>> like: 
> >>>> 
> >>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);    
-> 1.
> >>>> 		Step 	{ uint32_t HAHN_l, HAHN_h;
> >>>> 		HAHN_l = (uint32_t) msr_content;
> >>>> 		HAHN_h = (uint32_t) (msr_content >> 32);
> >>>> 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);      ->
2. Step 	}
> >>>>     if ( !msr_content )
> >>>>         return 0;
> >>>>     core2_vpmu_cxt->global_ovf_status |= msr_content;
> >>>>     msr_content = 0xC000000700000000 | ((1 <<
> >>>>     core2_get_pmc_count()) - 1);
> >>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);  
-> 3. Step
> >>>> 
> >>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);    
-> 4.
> >>>>         Step 	{ uint32_t HAHN_l, HAHN_h;
> >>>>         HAHN_l = (uint32_t) msr_content;
> >>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l);   
-> 5.
> >>>> Step 
> >>>> 
> >>>>         rdmsrl(0xc3, msr_content);                       
-> 6.
> >>>>         Step General counter #2 HAHN_l = (uint32_t)
msr_content;
> >>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
> >>>>         rdmsrl(0x30a, msr_content);                      
-> 7.
> >>>>         Step Fixed counter #1 HAHN_l = (uint32_t)
msr_content;
> >>>>         HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); 
}
> >>>> 
> >>>> With these tracers I got the following output:
> >>>> 
> >>>> Last good NMI:
> >>>> Both counter cause the NMI. Resetting works OK.
> >>>> The counter itself were running further.
> >>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ] 
> >>>> rdmsrl(0xc3) -> #2 general counter 
> >>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ] 
> >>>> rdmsrl(0x30a) -> #1 fixed counter 
> >>>> 
> >>>> NMI from where things goes wrong:
> >>>> Both counter cause the NMI. Resetting works NOT correct,
only for
> >>>> the general counter! The general counter (caused the NMI)
seems to
> >>>> be stopped! 
> >>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
> >>>> rdmsrl(0xc3) -> #2 general counter 
> >>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
> >>>> rdmsrl(0x30a) -> #1 fixed counter 
> >>>> 
> >>>> Wrong NMI:
> >>>> Only the fixed counter causes the NMI (which was not
resetted
> >>>> during NMI handling above!) Both counter seems to be
stopped!
> >>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ] 
> >>>> rdmsrl(0xc3) -> #2 general counter 
> >>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ] 
> >>>> rdmsrl(0x30a) -> #1 fixed counter 
> >>>> 
> >>>> And this state remains forever!
> >>>> I hope my explanations are understandable ;-)
> >>>> 
> >>>> Until now I can see this behavior only on a Nehalem
processor.
> >>>> 
> >>>> Thanks.
> >>>> Dietmar
> >>>> 
> >>>>> 
> >>>>> Best Regards
> >>>>> Shan Haitao
> >>>>> 
> >>>>> 2009/10/30 Keir Fraser
<keir.fraser@eu.citrix.com>:
> >>>>>> On 30/10/2009 12:20, "Dietmar Hahn"
> >>>>>> <dietmar.hahn@ts.fujitsu.com> wrote: 
> >>>>>> 
> >>>>>>> I searched the intel processor spec but
couldn''t find any help.
> >>>>>>> So my questions is, what is wrong here?
> >>>>>>> Can anybody with more knowledge point me in
the right direction,
> >>>>>>> what can I still do to find the real cause of
this?
> >>>>>> 
> >>>>>> You should probably Cc one of the Intel guys who
implemented this
> >>>>>> stuff -- I''ve added Haitao Shan.
> >>>>>> 
> >>>>>> Meanwhile I''d be interested to know
whether things work okay for
> >>>>>> you, minus performance counters and the hypervisor
hang, if you
> >>>>>> return immediately from vpmu_initialise(). Really
at minimum we
> >>>>>> need such a fix, perhaps with a boot paremeter to
re-enable the
> >>>>>> feature, for 3.4.2 release; allowing guests to
hose the
> >>>>>> hypervisor like this is of course not on.
> >>>>>> 
> >>>>>>  -- Keir
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> -- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Shan, Haitao

2009-Nov-03 08:02 UTC

head link

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

I suspect the guest will reproduce this PMI loop if guest behaves as you said in
this email. But as far as I know, VTune and oprofile do not behave like that.
Of course, this approach is still like workaround (unless I get comfirm that HW
requires to do so). This approach is preferrable because it does not change the
contents of MSRs. Thus, we have no impact on guest software that does rely on
reading the correct value from HW. Approach 1 existed just because we knew that
in event-based sampling, counter value on receiving PMI was not used by
OProfile/VTune at all and it was safe to set the counter to some non-zero value.

Haitao


Dietmar Hahn wrote:> Please see below.
> 
>> See my comments embedded. :)
>> 
>> Haitao
>> 
>> 
>> Dietmar Hahn wrote:
>>> The conclusion is, that this seems to be a workaround for the
>>> endless NMI loop. PMI''s are a very rarely event and this
should not
>>> raise a performance problem.
>> I totally agree that this is only a workaround for approach 1.
>> 
>>> 
>>> I didn''t try your second approach
>>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask
*physical
>>>> PMI* when guest vcpu unmasks virtual PMI. but I have some
question.
>>> 
>>> - What if the ''physical PMI'' is not unmasked in
vpmu_do_interrupt
>>>   and a watchdog NMI would occur before the domU unmasks it?
>> I think the second NMI will be lost.
>> 
>>> - Is it possible that after handling the NMI (and not unmasking)
>>>   another domU got running on this CPU and therefore PMI''s
got lost?
>> LVTPC entry in physical local APIC is save/restored by Xen on VCPU
>> switches. So unmasking (or not) of PMI of one vcpu should have no
>> impact on another vcpu. When developing vPMU, I treated as vPMU
>> context both PMU MSRs and LVTPC entry in local APIC. vPMU context is
>> save/restored on physical HW when vcpus is scheduled, either in an
>> active save/restore manner or a lazy one (depending on the PMU usage
>> at the time of switch).      
>> 
>>> 
>>> But the real cause of the problem is unknown. As said I saw this
>>> only on Nehalem. Maybe there is a problem together with the
>>> hardware? Perhaps your hardware colleagues know something more ;-)
>> When I found this problem, I just thought it might be a corner case
>> that only happens on my box (of course, I only see this in NHM,
>> too).  
>> I will try to pin HW guy to see if any explanation, since it is
>> proven to be a general problem on NHM. 
>> 
>> But before everything is clear, I think approach 2 is a better
>> solution now. 
> 
> What would be the effect if the guest unmasks the PMI (which leads to
> unmasking the ''physical PMI'') but doesn''t reset
the counter to a
> value != 0? Is the guest able to produce the nmi endless loop? 
> 
> Dietmar.
> 
>> 
>>> 
>>> Thanks
>>> Dietmar
>>> 
>>>> 
>>>>> 
>>>>> When I met this problem, I remember that I tried two
approaches:
>>>>> 1> Setting the counter to non-zero before unmasking PMI
in
>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from
vpmu_do_interrupt
>>>>> and unmask *physical PMI* when guest vcpu unmasks virtual
PMI.
>>>>> I remember that approach 2 can fix this issue. But I do not
>>>>> remember the result of approach 1, since I met this about
one
>>>>> year ago. It is my understanding that approach 2 is quite
same as
>>>>> approach 1, since normally guest will set the counter to
some
>>>>> negative value (for example, -100000) before unmasking
virtual
>>>>> PMI. 
>>>>> However, approach 2 looks cleaner and more reasonable.
>>>>> 
>>>>> Can you have a try and let me know the result? If both can
not
>>>>> work, there might be some problems that I have not met
before.
>>>>> 
>>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu
before.
>>>>> So, there is no need for me to work on that now. :)
>>>>> 
>>>>> Haitao
>>>>> 
>>>>> 
>>>>> Dietmar Hahn wrote:
>>>>>> Hi Haitao,
>>>>>> 
>>>>>>> Can I know how you enabled vPMU on Nehalem? This is
not
>>>>>>> supported in current Xen.
>>>>>> 
>>>>>>
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
>>>>>> 
>>>>>>> 
>>>>>>> Concerning vpmu support, I totally agree that we
can disable
>>>>>>> this feature by default. If anyone really wants to
use it, he
>>>>>>> can use boot options to turn it on.
>>>>>> 
>>>>>> Yes, that''s OK for me.
>>>>>> 
>>>>>>> I am preparing a patch for that. And I will
>>>>>>> send a patch to enable NHM vpmu together.
>>>>>>> 
>>>>>>> For the problem that Dietmar met, I think I once
met this
>>>>>>> before. Can you add some code in vpmu_do_interrupt
that sets
>>>>>>> the counter you are using to a value other than
zero? Please
>>>>>>> let me know if that can help.
>>>>>> 
>>>>>> I don''t set the counter to zero. I use 0-val
to set the counter.
>>>>>> Actually I testet on Nehalem with
>>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED
and
>>>>>> val=1100000 
>>>>>> - Fixed counter #1 (0x30a) and val=1100000
>>>>>> The thing is that in normal case the overflows of both
counters
>>>>>> appear nearly at the same time. As described I added
some extra
>>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the
code
>>>>>> looks like: 
>>>>>> 
>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);  
-> 1.
>>>>>> 		Step 	{ uint32_t HAHN_l, HAHN_h;
>>>>>> 		HAHN_l = (uint32_t) msr_content;
>>>>>> 		HAHN_h = (uint32_t) (msr_content >> 32);
>>>>>> 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);     
-> 2. Step
>>>>>>         }     if ( !msr_content ) return 0;
>>>>>>     core2_vpmu_cxt->global_ovf_status |=
msr_content;
>>>>>>     msr_content = 0xC000000700000000 | ((1 <<
>>>>>>     core2_get_pmc_count()) - 1);
>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);  
-> 3. Step
>>>>>> 
>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content);  
-> 4.
>>>>>>         Step 	{ uint32_t HAHN_l, HAHN_h;
>>>>>>         HAHN_l = (uint32_t) msr_content;
>>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); 
-> 5.
>>>>>> Step 
>>>>>> 
>>>>>>         rdmsrl(0xc3, msr_content);                     
-> 6.
>>>>>>         Step General counter #2 HAHN_l = (uint32_t)
msr_content;
>>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l);
>>>>>>         rdmsrl(0x30a, msr_content);                    
-> 7.
>>>>>>         Step Fixed counter #1 HAHN_l = (uint32_t)
msr_content;
>>>>>>         HAHN_h = (uint32_t) (msr_content >> 32);
>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h,
HAHN_l); 	}
>>>>>> 
>>>>>> With these tracers I got the following output:
>>>>>> 
>>>>>> Last good NMI:
>>>>>> Both counter cause the NMI. Resetting works OK.
>>>>>> The counter itself were running further.
>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low =  0x0000 ]
>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x03c4 ]
>>>>>> rdmsrl(0xc3) -> #2 general counter
>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x02da ]
>>>>>> rdmsrl(0x30a) -> #1 fixed counter
>>>>>> 
>>>>>> NMI from where things goes wrong:
>>>>>> Both counter cause the NMI. Resetting works NOT
correct, only for
>>>>>> the general counter! The general counter (caused the
NMI) seems
>>>>>> to be stopped! 
>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0004 ]
>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]
>>>>>> rdmsrl(0xc3) -> #2 general counter
>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]
>>>>>> rdmsrl(0x30a) -> #1 fixed counter
>>>>>> 
>>>>>> Wrong NMI:
>>>>>> Only the fixed counter causes the NMI (which was not
resetted
>>>>>> during NMI handling above!) Both counter seems to be
stopped!
>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low =  0x0000 ]
>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low =  0x0000 ]
>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low =  0x00ec ]
>>>>>> rdmsrl(0xc3) -> #2 general counter
>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low =  0x0000 ]
>>>>>> rdmsrl(0x30a) -> #1 fixed counter
>>>>>> 
>>>>>> And this state remains forever!
>>>>>> I hope my explanations are understandable ;-)
>>>>>> 
>>>>>> Until now I can see this behavior only on a Nehalem
processor.
>>>>>> 
>>>>>> Thanks.
>>>>>> Dietmar
>>>>>> 
>>>>>>> 
>>>>>>> Best Regards
>>>>>>> Shan Haitao
>>>>>>> 
>>>>>>> 2009/10/30 Keir Fraser
<keir.fraser@eu.citrix.com>:
>>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn"
>>>>>>>> <dietmar.hahn@ts.fujitsu.com> wrote:
>>>>>>>> 
>>>>>>>>> I searched the intel processor spec but
couldn''t find any
>>>>>>>>> help. So my questions is, what is wrong
here?
>>>>>>>>> Can anybody with more knowledge point me in
the right
>>>>>>>>> direction, what can I still do to find the
real cause of this?
>>>>>>>> 
>>>>>>>> You should probably Cc one of the Intel guys
who implemented
>>>>>>>> this stuff -- I''ve added Haitao Shan.
>>>>>>>> 
>>>>>>>> Meanwhile I''d be interested to know
whether things work okay
>>>>>>>> for you, minus performance counters and the
hypervisor hang,
>>>>>>>> if you return immediately from
vpmu_initialise(). Really at
>>>>>>>> minimum we need such a fix, perhaps with a boot
paremeter to
>>>>>>>> re-enable the feature, for 3.4.2 release;
allowing guests to
>>>>>>>> hose the hypervisor like this is of course not
on.
>>>>>>>> 
>>>>>>>>  -- Keir
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Nov-03 08:24 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

> I suspect the guest will reproduce this PMI loop if guest behaves as you
said in this email. But as far as I know, VTune and oprofile do not behave like
that.
> Of course, this approach is still like workaround (unless I get comfirm
that HW requires to do so). This approach is preferrable because it does not
change the contents of MSRs. Thus, we have no impact on guest software that does
rely on reading the correct value from HW. Approach 1 existed just because we
knew that in event-based sampling, counter value on receiving PMI was not used
by OProfile/VTune at all and it was safe to set the counter to some non-zero
value.
> 
> Haitao
>
OK, then will you send a patch? 
Dietmar.
 > 
> Dietmar Hahn wrote:
> > Please see below.
> > 
> >> See my comments embedded. :)
> >> 
> >> Haitao
> >> 
> >> 
> >> Dietmar Hahn wrote:
> >>> The conclusion is, that this seems to be a workaround for the
> >>> endless NMI loop. PMI''s are a very rarely event and
this should not
> >>> raise a performance problem.
> >> I totally agree that this is only a workaround for approach 1.
> >> 
> >>> 
> >>> I didn''t try your second approach
> >>>> 2> Remove unmasking PMI from vpmu_do_interrupt and
unmask *physical
> >>>> PMI* when guest vcpu unmasks virtual PMI. but I have some
question.
> >>> 
> >>> - What if the ''physical PMI'' is not unmasked
in vpmu_do_interrupt
> >>>   and a watchdog NMI would occur before the domU unmasks it?
> >> I think the second NMI will be lost.
> >> 
> >>> - Is it possible that after handling the NMI (and not
unmasking)
> >>>   another domU got running on this CPU and therefore
PMI''s got lost?
> >> LVTPC entry in physical local APIC is save/restored by Xen on VCPU
> >> switches. So unmasking (or not) of PMI of one vcpu should have no
> >> impact on another vcpu. When developing vPMU, I treated as vPMU
> >> context both PMU MSRs and LVTPC entry in local APIC. vPMU context
is
> >> save/restored on physical HW when vcpus is scheduled, either in an
> >> active save/restore manner or a lazy one (depending on the PMU
usage
> >> at the time of switch).      
> >> 
> >>> 
> >>> But the real cause of the problem is unknown. As said I saw
this
> >>> only on Nehalem. Maybe there is a problem together with the
> >>> hardware? Perhaps your hardware colleagues know something more
;-)
> >> When I found this problem, I just thought it might be a corner
case
> >> that only happens on my box (of course, I only see this in NHM,
> >> too).  
> >> I will try to pin HW guy to see if any explanation, since it is
> >> proven to be a general problem on NHM. 
> >> 
> >> But before everything is clear, I think approach 2 is a better
> >> solution now. 
> > 
> > What would be the effect if the guest unmasks the PMI (which leads to
> > unmasking the ''physical PMI'') but doesn''t
reset the counter to a
> > value != 0? Is the guest able to produce the nmi endless loop? 
> > 
> > Dietmar.
> > 
> >> 
> >>> 
> >>> Thanks
> >>> Dietmar
> >>> 
> >>>> 
> >>>>> 
> >>>>> When I met this problem, I remember that I tried two
approaches:
> >>>>> 1> Setting the counter to non-zero before unmasking
PMI in
> >>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from
vpmu_do_interrupt
> >>>>> and unmask *physical PMI* when guest vcpu unmasks
virtual PMI.
> >>>>> I remember that approach 2 can fix this issue. But I
do not
> >>>>> remember the result of approach 1, since I met this
about one
> >>>>> year ago. It is my understanding that approach 2 is
quite same as
> >>>>> approach 1, since normally guest will set the counter
to some
> >>>>> negative value (for example, -100000) before unmasking
virtual
> >>>>> PMI. 
> >>>>> However, approach 2 looks cleaner and more reasonable.
> >>>>> 
> >>>>> Can you have a try and let me know the result? If both
can not
> >>>>> work, there might be some problems that I have not met
before.
> >>>>> 
> >>>>> BTW: Sorry, I did not see your patch to enable NHM
vpmu before.
> >>>>> So, there is no need for me to work on that now. :)
> >>>>> 
> >>>>> Haitao
> >>>>> 
> >>>>> 
> >>>>> Dietmar Hahn wrote:
> >>>>>> Hi Haitao,
> >>>>>> 
> >>>>>>> Can I know how you enabled vPMU on Nehalem?
This is not
> >>>>>>> supported in current Xen.
> >>>>>> 
> >>>>>>
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> >>>>>> 
> >>>>>>> 
> >>>>>>> Concerning vpmu support, I totally agree that
we can disable
> >>>>>>> this feature by default. If anyone really
wants to use it, he
> >>>>>>> can use boot options to turn it on.
> >>>>>> 
> >>>>>> Yes, that''s OK for me.
> >>>>>> 
> >>>>>>> I am preparing a patch for that. And I will
> >>>>>>> send a patch to enable NHM vpmu together.
> >>>>>>> 
> >>>>>>> For the problem that Dietmar met, I think I
once met this
> >>>>>>> before. Can you add some code in
vpmu_do_interrupt that sets
> >>>>>>> the counter you are using to a value other
than zero? Please
> >>>>>>> let me know if that can help.
> >>>>>> 
> >>>>>> I don''t set the counter to zero. I use
0-val to set the counter.
> >>>>>> Actually I testet on Nehalem with
> >>>>>> - General Perf-counter #2 (0xc3) with
CPU_CLK_UNHALTED and
> >>>>>> val=1100000 
> >>>>>> - Fixed counter #1 (0x30a) and val=1100000
> >>>>>> The thing is that in normal case the overflows of
both counters
> >>>>>> appear nearly at the same time. As described I
added some extra
> >>>>>> tracer for xentrace in core2_vpmu_do_interrupt()
so the code
> >>>>>> looks like: 
> >>>>>> 
> >>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
msr_content);     -> 1.
> >>>>>> 		Step 	{ uint32_t HAHN_l, HAHN_h;
> >>>>>> 		HAHN_l = (uint32_t) msr_content;
> >>>>>> 		HAHN_h = (uint32_t) (msr_content >> 32);
> >>>>>> 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l);    
-> 2. Step
> >>>>>>         }     if ( !msr_content ) return 0;
> >>>>>>     core2_vpmu_cxt->global_ovf_status |=
msr_content;
> >>>>>>     msr_content = 0xC000000700000000 | ((1
<<
> >>>>>>     core2_get_pmc_count()) - 1);
> >>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
msr_content);   -> 3. Step
> >>>>>> 
> >>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
msr_content);     -> 4.
> >>>>>>         Step 	{ uint32_t HAHN_l, HAHN_h;
> >>>>>>         HAHN_l = (uint32_t) msr_content;
> >>>>>>         HAHN_h = (uint32_t) (msr_content >>
32);
> >>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h,
HAHN_l);    -> 5.
> >>>>>> Step 
> >>>>>> 
> >>>>>>         rdmsrl(0xc3, msr_content);                
-> 6.
> >>>>>>         Step General counter #2 HAHN_l =
(uint32_t) msr_content;
> >>>>>>         HAHN_h = (uint32_t) (msr_content >>
32);
> >>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h,
HAHN_l);
> >>>>>>         rdmsrl(0x30a, msr_content);               
-> 7.
> >>>>>>         Step Fixed counter #1 HAHN_l = (uint32_t)
msr_content;
> >>>>>>         HAHN_h = (uint32_t) (msr_content >>
32);
> >>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h,
HAHN_l); 	}
> >>>>>> 
> >>>>>> With these tracers I got the following output:
> >>>>>> 
> >>>>>> Last good NMI:
> >>>>>> Both counter cause the NMI. Resetting works OK.
> >>>>>> The counter itself were running further.
> >>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low = 
0x0004 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low = 
0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low = 
0x03c4 ]
> >>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 
0x02da ]
> >>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>> 
> >>>>>> NMI from where things goes wrong:
> >>>>>> Both counter cause the NMI. Resetting works NOT
correct, only for
> >>>>>> the general counter! The general counter (caused
the NMI) seems
> >>>>>> to be stopped! 
> >>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low = 
0x0004 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low = 
0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low = 
0x00ec ]
> >>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 
0x0000 ]
> >>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>> 
> >>>>>> Wrong NMI:
> >>>>>> Only the fixed counter causes the NMI (which was
not resetted
> >>>>>> during NMI handling above!) Both counter seems to
be stopped!
> >>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low = 
0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low = 
0x0000 ]
> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low = 
0x00ec ]
> >>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 
0x0000 ]
> >>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>> 
> >>>>>> And this state remains forever!
> >>>>>> I hope my explanations are understandable ;-)
> >>>>>> 
> >>>>>> Until now I can see this behavior only on a
Nehalem processor.
> >>>>>> 
> >>>>>> Thanks.
> >>>>>> Dietmar
> >>>>>> 
> >>>>>>> 
> >>>>>>> Best Regards
> >>>>>>> Shan Haitao
> >>>>>>> 
> >>>>>>> 2009/10/30 Keir Fraser
<keir.fraser@eu.citrix.com>:
> >>>>>>>> On 30/10/2009 12:20, "Dietmar
Hahn"
> >>>>>>>> <dietmar.hahn@ts.fujitsu.com> wrote:
> >>>>>>>> 
> >>>>>>>>> I searched the intel processor spec
but couldn''t find any
> >>>>>>>>> help. So my questions is, what is
wrong here?
> >>>>>>>>> Can anybody with more knowledge point
me in the right
> >>>>>>>>> direction, what can I still do to find
the real cause of this?
> >>>>>>>> 
> >>>>>>>> You should probably Cc one of the Intel
guys who implemented
> >>>>>>>> this stuff -- I''ve added Haitao
Shan.
> >>>>>>>> 
> >>>>>>>> Meanwhile I''d be interested to
know whether things work okay
> >>>>>>>> for you, minus performance counters and
the hypervisor hang,
> >>>>>>>> if you return immediately from
vpmu_initialise(). Really at
> >>>>>>>> minimum we need such a fix, perhaps with a
boot paremeter to
> >>>>>>>> re-enable the feature, for 3.4.2 release;
allowing guests to
> >>>>>>>> hose the hypervisor like this is of course
not on.
> >>>>>>>> 
> >>>>>>>>  -- Keir
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> -- 
Dietmar Hahn
TSP ES&S SWE OS                                Telephone: +49 (0) 89 636
40274
Fujitsu Technology Solutions                Email: dietmar.hahn@ts.fujitsu.com
Otto-Hahn-Ring 6                              Internet:  http://ts.fujitsu.com
D-81739 München                    Company details:ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Shan, Haitao

2009-Nov-03 08:43 UTC

head link

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

No problem. 
Can you help to test? I have no test box at hand now, which might cause delay.

Haitao


Dietmar Hahn wrote:>> I suspect the guest will reproduce this PMI loop if guest behaves as
>> you said in this email. But as far as I know, VTune and oprofile do
>> not behave like that.  
>> Of course, this approach is still like workaround (unless I get
>> comfirm that HW requires to do so). This approach is preferrable
>> because it does not change the contents of MSRs. Thus, we have no
>> impact on guest software that does rely on reading the correct value
>> from HW. Approach 1 existed just because we knew that in event-based
>> sampling, counter value on receiving PMI was not used by
>> OProfile/VTune at all and it was safe to set the counter to some
>> non-zero value.       
>> 
>> Haitao
>> 
> 
> OK, then will you send a patch?
> Dietmar.
> 
>> 
>> Dietmar Hahn wrote:
>>> Please see below.
>>> 
>>>> See my comments embedded. :)
>>>> 
>>>> Haitao
>>>> 
>>>> 
>>>> Dietmar Hahn wrote:
>>>>> The conclusion is, that this seems to be a workaround for
the
>>>>> endless NMI loop. PMI''s are a very rarely event
and this should
>>>>> not raise a performance problem.
>>>> I totally agree that this is only a workaround for approach 1.
>>>> 
>>>>> 
>>>>> I didn''t try your second approach
>>>>>> 2> Remove unmasking PMI from vpmu_do_interrupt and
unmask
>>>>>> *physical PMI* when guest vcpu unmasks virtual PMI. but
I have
>>>>>> some question. 
>>>>> 
>>>>> - What if the ''physical PMI'' is not
unmasked in vpmu_do_interrupt
>>>>>   and a watchdog NMI would occur before the domU unmasks
it?
>>>> I think the second NMI will be lost.
>>>> 
>>>>> - Is it possible that after handling the NMI (and not
unmasking)
>>>>>   another domU got running on this CPU and therefore
PMI''s got
>>>>> lost? 
>>>> LVTPC entry in physical local APIC is save/restored by Xen on
VCPU
>>>> switches. So unmasking (or not) of PMI of one vcpu should have
no
>>>> impact on another vcpu. When developing vPMU, I treated as vPMU
>>>> context both PMU MSRs and LVTPC entry in local APIC. vPMU
context
>>>> is save/restored on physical HW when vcpus is scheduled, either
in
>>>> an active save/restore manner or a lazy one (depending on the
PMU
>>>> usage at the time of switch). 
>>>> 
>>>>> 
>>>>> But the real cause of the problem is unknown. As said I saw
this
>>>>> only on Nehalem. Maybe there is a problem together with the
>>>>> hardware? Perhaps your hardware colleagues know something
more ;-)
>>>> When I found this problem, I just thought it might be a corner
case
>>>> that only happens on my box (of course, I only see this in NHM,
>>>> too). I will try to pin HW guy to see if any explanation, since
it
>>>> is proven to be a general problem on NHM.
>>>> 
>>>> But before everything is clear, I think approach 2 is a better
>>>> solution now.
>>> 
>>> What would be the effect if the guest unmasks the PMI (which leads
>>> to unmasking the ''physical PMI'') but
doesn''t reset the counter to a
>>> value != 0? Is the guest able to produce the nmi endless loop?
>>> 
>>> Dietmar.
>>> 
>>>> 
>>>>> 
>>>>> Thanks
>>>>> Dietmar
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> When I met this problem, I remember that I tried
two approaches:
>>>>>>> 1> Setting the counter to non-zero before
unmasking PMI in
>>>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from
>>>>>>> vpmu_do_interrupt and unmask *physical PMI* when
guest vcpu
>>>>>>> unmasks virtual PMI. 
>>>>>>> I remember that approach 2 can fix this issue. But
I do not
>>>>>>> remember the result of approach 1, since I met this
about one
>>>>>>> year ago. It is my understanding that approach 2 is
quite same
>>>>>>> as approach 1, since normally guest will set the
counter to some
>>>>>>> negative value (for example, -100000) before
unmasking virtual
>>>>>>> PMI. However, approach 2 looks cleaner and more
reasonable.
>>>>>>> 
>>>>>>> Can you have a try and let me know the result? If
both can not
>>>>>>> work, there might be some problems that I have not
met before.
>>>>>>> 
>>>>>>> BTW: Sorry, I did not see your patch to enable NHM
vpmu before.
>>>>>>> So, there is no need for me to work on that now. :)
>>>>>>> 
>>>>>>> Haitao
>>>>>>> 
>>>>>>> 
>>>>>>> Dietmar Hahn wrote:
>>>>>>>> Hi Haitao,
>>>>>>>> 
>>>>>>>>> Can I know how you enabled vPMU on Nehalem?
This is not
>>>>>>>>> supported in current Xen.
>>>>>>>> 
>>>>>>>>
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Concerning vpmu support, I totally agree
that we can disable
>>>>>>>>> this feature by default. If anyone really
wants to use it, he
>>>>>>>>> can use boot options to turn it on.
>>>>>>>> 
>>>>>>>> Yes, that''s OK for me.
>>>>>>>> 
>>>>>>>>> I am preparing a patch for that. And I will
>>>>>>>>> send a patch to enable NHM vpmu together.
>>>>>>>>> 
>>>>>>>>> For the problem that Dietmar met, I think I
once met this
>>>>>>>>> before. Can you add some code in
vpmu_do_interrupt that sets
>>>>>>>>> the counter you are using to a value other
than zero? Please
>>>>>>>>> let me know if that can help.
>>>>>>>> 
>>>>>>>> I don''t set the counter to zero. I use
0-val to set the
>>>>>>>> counter. Actually I testet on Nehalem with
>>>>>>>> - General Perf-counter #2 (0xc3) with
CPU_CLK_UNHALTED and
>>>>>>>> val=1100000 
>>>>>>>> - Fixed counter #1 (0x30a) and val=1100000
>>>>>>>> The thing is that in normal case the overflows
of both counters
>>>>>>>> appear nearly at the same time. As described I
added some extra
>>>>>>>> tracer for xentrace in
core2_vpmu_do_interrupt() so the code
>>>>>>>> looks like: 
>>>>>>>> 
>>>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
msr_content);     -> 1.
>>>>>>>> 		Step 	{ uint32_t HAHN_l, HAHN_h;
>>>>>>>> 		HAHN_l = (uint32_t) msr_content;
>>>>>>>> 		HAHN_h = (uint32_t) (msr_content >>
32);
>>>>>>>> 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); 
-> 2. Step
>>>>>>>>         }     if ( !msr_content ) return 0;
>>>>>>>>     core2_vpmu_cxt->global_ovf_status |=
msr_content;
>>>>>>>>     msr_content = 0xC000000700000000 | ((1
<<
>>>>>>>>     core2_get_pmc_count()) - 1);
>>>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
msr_content);   -> 3.
>>>>>>>> Step 
>>>>>>>> 
>>>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
msr_content);     -> 4.
>>>>>>>>         Step 	{ uint32_t HAHN_l, HAHN_h;
>>>>>>>>         HAHN_l = (uint32_t) msr_content;
>>>>>>>>         HAHN_h = (uint32_t) (msr_content
>> 32);
>>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h,
HAHN_l);    ->
>>>>>>>> 5. Step 
>>>>>>>> 
>>>>>>>>         rdmsrl(0xc3, msr_content);             
-> 6.
>>>>>>>>         Step General counter #2 HAHN_l =
(uint32_t)
>>>>>>>>         msr_content; HAHN_h = (uint32_t)
(msr_content >> 32);
>>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h,
HAHN_l);
>>>>>>>>         rdmsrl(0x30a, msr_content);            
-> 7.
>>>>>>>>         Step Fixed counter #1 HAHN_l =
(uint32_t) msr_content;
>>>>>>>>         HAHN_h = (uint32_t) (msr_content
>> 32);
>>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h,
HAHN_l); 	}
>>>>>>>> 
>>>>>>>> With these tracers I got the following output:
>>>>>>>> 
>>>>>>>> Last good NMI:
>>>>>>>> Both counter cause the NMI. Resetting works OK.
>>>>>>>> The counter itself were running further.
>>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low = 
0x0004 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low = 
0x0000 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low = 
0x03c4 ]
>>>>>>>> rdmsrl(0xc3) -> #2 general counter
>>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 
0x02da ]
>>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
>>>>>>>> 
>>>>>>>> NMI from where things goes wrong:
>>>>>>>> Both counter cause the NMI. Resetting works NOT
correct, only
>>>>>>>> for the general counter! The general counter
(caused the NMI)
>>>>>>>> seems to be stopped! 
>>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low = 
0x0004 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low = 
0x0000 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low = 
0x00ec ]
>>>>>>>> rdmsrl(0xc3) -> #2 general counter
>>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 
0x0000 ]
>>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
>>>>>>>> 
>>>>>>>> Wrong NMI:
>>>>>>>> Only the fixed counter causes the NMI (which
was not resetted
>>>>>>>> during NMI handling above!) Both counter seems
to be stopped!
>>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low = 
0x0000 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low = 
0x0000 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low = 
0x00ec ]
>>>>>>>> rdmsrl(0xc3) -> #2 general counter
>>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 
0x0000 ]
>>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
>>>>>>>> 
>>>>>>>> And this state remains forever!
>>>>>>>> I hope my explanations are understandable ;-)
>>>>>>>> 
>>>>>>>> Until now I can see this behavior only on a
Nehalem processor.
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> Dietmar
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best Regards
>>>>>>>>> Shan Haitao
>>>>>>>>> 
>>>>>>>>> 2009/10/30 Keir Fraser
<keir.fraser@eu.citrix.com>:
>>>>>>>>>> On 30/10/2009 12:20, "Dietmar
Hahn"
>>>>>>>>>> <dietmar.hahn@ts.fujitsu.com>
wrote:
>>>>>>>>>> 
>>>>>>>>>>> I searched the intel processor spec
but couldn''t find any
>>>>>>>>>>> help. So my questions is, what is
wrong here?
>>>>>>>>>>> Can anybody with more knowledge
point me in the right
>>>>>>>>>>> direction, what can I still do to
find the real cause of
>>>>>>>>>>> this? 
>>>>>>>>>> 
>>>>>>>>>> You should probably Cc one of the Intel
guys who implemented
>>>>>>>>>> this stuff -- I''ve added
Haitao Shan.
>>>>>>>>>> 
>>>>>>>>>> Meanwhile I''d be interested to
know whether things work okay
>>>>>>>>>> for you, minus performance counters and
the hypervisor hang,
>>>>>>>>>> if you return immediately from
vpmu_initialise(). Really at
>>>>>>>>>> minimum we need such a fix, perhaps
with a boot paremeter to
>>>>>>>>>> re-enable the feature, for 3.4.2
release; allowing guests to
>>>>>>>>>> hose the hypervisor like this is of
course not on.
>>>>>>>>>> 
>>>>>>>>>>  -- Keir
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Shan, Haitao

2009-Nov-03 09:00 UTC

head link

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

Hi, Dietmar,

Please review the attached patch. Any comments?

Haitao


Dietmar Hahn wrote:>> I suspect the guest will reproduce this PMI loop if guest behaves as
>> you said in this email. But as far as I know, VTune and oprofile do
>> not behave like that.  
>> Of course, this approach is still like workaround (unless I get
>> comfirm that HW requires to do so). This approach is preferrable
>> because it does not change the contents of MSRs. Thus, we have no
>> impact on guest software that does rely on reading the correct value
>> from HW. Approach 1 existed just because we knew that in event-based
>> sampling, counter value on receiving PMI was not used by
>> OProfile/VTune at all and it was safe to set the counter to some
>> non-zero value.       
>> 
>> Haitao
>> 
> 
> OK, then will you send a patch?
> Dietmar.
> 
>> 
>> Dietmar Hahn wrote:
>>> Please see below.
>>> 
>>>> See my comments embedded. :)
>>>> 
>>>> Haitao
>>>> 
>>>> 
>>>> Dietmar Hahn wrote:
>>>>> The conclusion is, that this seems to be a workaround for
the
>>>>> endless NMI loop. PMI''s are a very rarely event
and this should
>>>>> not raise a performance problem.
>>>> I totally agree that this is only a workaround for approach 1.
>>>> 
>>>>> 
>>>>> I didn''t try your second approach
>>>>>> 2> Remove unmasking PMI from vpmu_do_interrupt and
unmask
>>>>>> *physical PMI* when guest vcpu unmasks virtual PMI. but
I have
>>>>>> some question. 
>>>>> 
>>>>> - What if the ''physical PMI'' is not
unmasked in vpmu_do_interrupt
>>>>>   and a watchdog NMI would occur before the domU unmasks
it?
>>>> I think the second NMI will be lost.
>>>> 
>>>>> - Is it possible that after handling the NMI (and not
unmasking)
>>>>>   another domU got running on this CPU and therefore
PMI''s got
>>>>> lost? 
>>>> LVTPC entry in physical local APIC is save/restored by Xen on
VCPU
>>>> switches. So unmasking (or not) of PMI of one vcpu should have
no
>>>> impact on another vcpu. When developing vPMU, I treated as vPMU
>>>> context both PMU MSRs and LVTPC entry in local APIC. vPMU
context
>>>> is save/restored on physical HW when vcpus is scheduled, either
in
>>>> an active save/restore manner or a lazy one (depending on the
PMU
>>>> usage at the time of switch). 
>>>> 
>>>>> 
>>>>> But the real cause of the problem is unknown. As said I saw
this
>>>>> only on Nehalem. Maybe there is a problem together with the
>>>>> hardware? Perhaps your hardware colleagues know something
more ;-)
>>>> When I found this problem, I just thought it might be a corner
case
>>>> that only happens on my box (of course, I only see this in NHM,
>>>> too). I will try to pin HW guy to see if any explanation, since
it
>>>> is proven to be a general problem on NHM.
>>>> 
>>>> But before everything is clear, I think approach 2 is a better
>>>> solution now.
>>> 
>>> What would be the effect if the guest unmasks the PMI (which leads
>>> to unmasking the ''physical PMI'') but
doesn''t reset the counter to a
>>> value != 0? Is the guest able to produce the nmi endless loop?
>>> 
>>> Dietmar.
>>> 
>>>> 
>>>>> 
>>>>> Thanks
>>>>> Dietmar
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> When I met this problem, I remember that I tried
two approaches:
>>>>>>> 1> Setting the counter to non-zero before
unmasking PMI in
>>>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from
>>>>>>> vpmu_do_interrupt and unmask *physical PMI* when
guest vcpu
>>>>>>> unmasks virtual PMI. 
>>>>>>> I remember that approach 2 can fix this issue. But
I do not
>>>>>>> remember the result of approach 1, since I met this
about one
>>>>>>> year ago. It is my understanding that approach 2 is
quite same
>>>>>>> as approach 1, since normally guest will set the
counter to some
>>>>>>> negative value (for example, -100000) before
unmasking virtual
>>>>>>> PMI. However, approach 2 looks cleaner and more
reasonable.
>>>>>>> 
>>>>>>> Can you have a try and let me know the result? If
both can not
>>>>>>> work, there might be some problems that I have not
met before.
>>>>>>> 
>>>>>>> BTW: Sorry, I did not see your patch to enable NHM
vpmu before.
>>>>>>> So, there is no need for me to work on that now. :)
>>>>>>> 
>>>>>>> Haitao
>>>>>>> 
>>>>>>> 
>>>>>>> Dietmar Hahn wrote:
>>>>>>>> Hi Haitao,
>>>>>>>> 
>>>>>>>>> Can I know how you enabled vPMU on Nehalem?
This is not
>>>>>>>>> supported in current Xen.
>>>>>>>> 
>>>>>>>>
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Concerning vpmu support, I totally agree
that we can disable
>>>>>>>>> this feature by default. If anyone really
wants to use it, he
>>>>>>>>> can use boot options to turn it on.
>>>>>>>> 
>>>>>>>> Yes, that''s OK for me.
>>>>>>>> 
>>>>>>>>> I am preparing a patch for that. And I will
>>>>>>>>> send a patch to enable NHM vpmu together.
>>>>>>>>> 
>>>>>>>>> For the problem that Dietmar met, I think I
once met this
>>>>>>>>> before. Can you add some code in
vpmu_do_interrupt that sets
>>>>>>>>> the counter you are using to a value other
than zero? Please
>>>>>>>>> let me know if that can help.
>>>>>>>> 
>>>>>>>> I don''t set the counter to zero. I use
0-val to set the
>>>>>>>> counter. Actually I testet on Nehalem with
>>>>>>>> - General Perf-counter #2 (0xc3) with
CPU_CLK_UNHALTED and
>>>>>>>> val=1100000 
>>>>>>>> - Fixed counter #1 (0x30a) and val=1100000
>>>>>>>> The thing is that in normal case the overflows
of both counters
>>>>>>>> appear nearly at the same time. As described I
added some extra
>>>>>>>> tracer for xentrace in
core2_vpmu_do_interrupt() so the code
>>>>>>>> looks like: 
>>>>>>>> 
>>>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
msr_content);     -> 1.
>>>>>>>> 		Step 	{ uint32_t HAHN_l, HAHN_h;
>>>>>>>> 		HAHN_l = (uint32_t) msr_content;
>>>>>>>> 		HAHN_h = (uint32_t) (msr_content >>
32);
>>>>>>>> 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); 
-> 2. Step
>>>>>>>>         }     if ( !msr_content ) return 0;
>>>>>>>>     core2_vpmu_cxt->global_ovf_status |=
msr_content;
>>>>>>>>     msr_content = 0xC000000700000000 | ((1
<<
>>>>>>>>     core2_get_pmc_count()) - 1);
>>>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
msr_content);   -> 3.
>>>>>>>> Step 
>>>>>>>> 
>>>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
msr_content);     -> 4.
>>>>>>>>         Step 	{ uint32_t HAHN_l, HAHN_h;
>>>>>>>>         HAHN_l = (uint32_t) msr_content;
>>>>>>>>         HAHN_h = (uint32_t) (msr_content
>> 32);
>>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h,
HAHN_l);    ->
>>>>>>>> 5. Step 
>>>>>>>> 
>>>>>>>>         rdmsrl(0xc3, msr_content);             
-> 6.
>>>>>>>>         Step General counter #2 HAHN_l =
(uint32_t)
>>>>>>>>         msr_content; HAHN_h = (uint32_t)
(msr_content >> 32);
>>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h,
HAHN_l);
>>>>>>>>         rdmsrl(0x30a, msr_content);            
-> 7.
>>>>>>>>         Step Fixed counter #1 HAHN_l =
(uint32_t) msr_content;
>>>>>>>>         HAHN_h = (uint32_t) (msr_content
>> 32);
>>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h,
HAHN_l); 	}
>>>>>>>> 
>>>>>>>> With these tracers I got the following output:
>>>>>>>> 
>>>>>>>> Last good NMI:
>>>>>>>> Both counter cause the NMI. Resetting works OK.
>>>>>>>> The counter itself were running further.
>>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low = 
0x0004 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low = 
0x0000 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low = 
0x03c4 ]
>>>>>>>> rdmsrl(0xc3) -> #2 general counter
>>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 
0x02da ]
>>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
>>>>>>>> 
>>>>>>>> NMI from where things goes wrong:
>>>>>>>> Both counter cause the NMI. Resetting works NOT
correct, only
>>>>>>>> for the general counter! The general counter
(caused the NMI)
>>>>>>>> seems to be stopped! 
>>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low = 
0x0004 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low = 
0x0000 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low = 
0x00ec ]
>>>>>>>> rdmsrl(0xc3) -> #2 general counter
>>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 
0x0000 ]
>>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
>>>>>>>> 
>>>>>>>> Wrong NMI:
>>>>>>>> Only the fixed counter causes the NMI (which
was not resetted
>>>>>>>> during NMI handling above!) Both counter seems
to be stopped!
>>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low = 
0x0000 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low = 
0x0000 ]
>>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
>>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low = 
0x00ec ]
>>>>>>>> rdmsrl(0xc3) -> #2 general counter
>>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 
0x0000 ]
>>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
>>>>>>>> 
>>>>>>>> And this state remains forever!
>>>>>>>> I hope my explanations are understandable ;-)
>>>>>>>> 
>>>>>>>> Until now I can see this behavior only on a
Nehalem processor.
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> Dietmar
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Best Regards
>>>>>>>>> Shan Haitao
>>>>>>>>> 
>>>>>>>>> 2009/10/30 Keir Fraser
<keir.fraser@eu.citrix.com>:
>>>>>>>>>> On 30/10/2009 12:20, "Dietmar
Hahn"
>>>>>>>>>> <dietmar.hahn@ts.fujitsu.com>
wrote:
>>>>>>>>>> 
>>>>>>>>>>> I searched the intel processor spec
but couldn''t find any
>>>>>>>>>>> help. So my questions is, what is
wrong here?
>>>>>>>>>>> Can anybody with more knowledge
point me in the right
>>>>>>>>>>> direction, what can I still do to
find the real cause of
>>>>>>>>>>> this? 
>>>>>>>>>> 
>>>>>>>>>> You should probably Cc one of the Intel
guys who implemented
>>>>>>>>>> this stuff -- I''ve added
Haitao Shan.
>>>>>>>>>> 
>>>>>>>>>> Meanwhile I''d be interested to
know whether things work okay
>>>>>>>>>> for you, minus performance counters and
the hypervisor hang,
>>>>>>>>>> if you return immediately from
vpmu_initialise(). Really at
>>>>>>>>>> minimum we need such a fix, perhaps
with a boot paremeter to
>>>>>>>>>> re-enable the feature, for 3.4.2
release; allowing guests to
>>>>>>>>>> hose the hypervisor like this is of
course not on.
>>>>>>>>>> 
>>>>>>>>>>  -- Keir
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dietmar Hahn

2009-Nov-03 09:03 UTC

head link

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

> No problem. 
> Can you help to test? I have no test box at hand now, which might cause
delay.
> 
Sure :-)
Dietmar.
> Haitao
> 
> 
> Dietmar Hahn wrote:
> >> I suspect the guest will reproduce this PMI loop if guest behaves
as
> >> you said in this email. But as far as I know, VTune and oprofile
do
> >> not behave like that.  
> >> Of course, this approach is still like workaround (unless I get
> >> comfirm that HW requires to do so). This approach is preferrable
> >> because it does not change the contents of MSRs. Thus, we have no
> >> impact on guest software that does rely on reading the correct
value
> >> from HW. Approach 1 existed just because we knew that in
event-based
> >> sampling, counter value on receiving PMI was not used by
> >> OProfile/VTune at all and it was safe to set the counter to some
> >> non-zero value.       
> >> 
> >> Haitao
> >> 
> > 
> > OK, then will you send a patch?
> > Dietmar.
> > 
> >> 
> >> Dietmar Hahn wrote:
> >>> Please see below.
> >>> 
> >>>> See my comments embedded. :)
> >>>> 
> >>>> Haitao
> >>>> 
> >>>> 
> >>>> Dietmar Hahn wrote:
> >>>>> The conclusion is, that this seems to be a workaround
for the
> >>>>> endless NMI loop. PMI''s are a very rarely
event and this should
> >>>>> not raise a performance problem.
> >>>> I totally agree that this is only a workaround for
approach 1.
> >>>> 
> >>>>> 
> >>>>> I didn''t try your second approach
> >>>>>> 2> Remove unmasking PMI from vpmu_do_interrupt
and unmask
> >>>>>> *physical PMI* when guest vcpu unmasks virtual
PMI. but I have
> >>>>>> some question. 
> >>>>> 
> >>>>> - What if the ''physical PMI'' is not
unmasked in vpmu_do_interrupt
> >>>>>   and a watchdog NMI would occur before the domU
unmasks it?
> >>>> I think the second NMI will be lost.
> >>>> 
> >>>>> - Is it possible that after handling the NMI (and not
unmasking)
> >>>>>   another domU got running on this CPU and therefore
PMI''s got
> >>>>> lost? 
> >>>> LVTPC entry in physical local APIC is save/restored by Xen
on VCPU
> >>>> switches. So unmasking (or not) of PMI of one vcpu should
have no
> >>>> impact on another vcpu. When developing vPMU, I treated as
vPMU
> >>>> context both PMU MSRs and LVTPC entry in local APIC. vPMU
context
> >>>> is save/restored on physical HW when vcpus is scheduled,
either in
> >>>> an active save/restore manner or a lazy one (depending on
the PMU
> >>>> usage at the time of switch). 
> >>>> 
> >>>>> 
> >>>>> But the real cause of the problem is unknown. As said
I saw this
> >>>>> only on Nehalem. Maybe there is a problem together
with the
> >>>>> hardware? Perhaps your hardware colleagues know
something more ;-)
> >>>> When I found this problem, I just thought it might be a
corner case
> >>>> that only happens on my box (of course, I only see this in
NHM,
> >>>> too). I will try to pin HW guy to see if any explanation,
since it
> >>>> is proven to be a general problem on NHM.
> >>>> 
> >>>> But before everything is clear, I think approach 2 is a
better
> >>>> solution now.
> >>> 
> >>> What would be the effect if the guest unmasks the PMI (which
leads
> >>> to unmasking the ''physical PMI'') but
doesn''t reset the counter to a
> >>> value != 0? Is the guest able to produce the nmi endless loop?
> >>> 
> >>> Dietmar.
> >>> 
> >>>> 
> >>>>> 
> >>>>> Thanks
> >>>>> Dietmar
> >>>>> 
> >>>>>> 
> >>>>>>> 
> >>>>>>> When I met this problem, I remember that I
tried two approaches:
> >>>>>>> 1> Setting the counter to non-zero before
unmasking PMI in
> >>>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI
from
> >>>>>>> vpmu_do_interrupt and unmask *physical PMI*
when guest vcpu
> >>>>>>> unmasks virtual PMI. 
> >>>>>>> I remember that approach 2 can fix this issue.
But I do not
> >>>>>>> remember the result of approach 1, since I met
this about one
> >>>>>>> year ago. It is my understanding that approach
2 is quite same
> >>>>>>> as approach 1, since normally guest will set
the counter to some
> >>>>>>> negative value (for example, -100000) before
unmasking virtual
> >>>>>>> PMI. However, approach 2 looks cleaner and
more reasonable.
> >>>>>>> 
> >>>>>>> Can you have a try and let me know the result?
If both can not
> >>>>>>> work, there might be some problems that I have
not met before.
> >>>>>>> 
> >>>>>>> BTW: Sorry, I did not see your patch to enable
NHM vpmu before.
> >>>>>>> So, there is no need for me to work on that
now. :)
> >>>>>>> 
> >>>>>>> Haitao
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Dietmar Hahn wrote:
> >>>>>>>> Hi Haitao,
> >>>>>>>> 
> >>>>>>>>> Can I know how you enabled vPMU on
Nehalem? This is not
> >>>>>>>>> supported in current Xen.
> >>>>>>>> 
> >>>>>>>>
http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html
> >>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> Concerning vpmu support, I totally
agree that we can disable
> >>>>>>>>> this feature by default. If anyone
really wants to use it, he
> >>>>>>>>> can use boot options to turn it on.
> >>>>>>>> 
> >>>>>>>> Yes, that''s OK for me.
> >>>>>>>> 
> >>>>>>>>> I am preparing a patch for that. And I
will
> >>>>>>>>> send a patch to enable NHM vpmu
together.
> >>>>>>>>> 
> >>>>>>>>> For the problem that Dietmar met, I
think I once met this
> >>>>>>>>> before. Can you add some code in
vpmu_do_interrupt that sets
> >>>>>>>>> the counter you are using to a value
other than zero? Please
> >>>>>>>>> let me know if that can help.
> >>>>>>>> 
> >>>>>>>> I don''t set the counter to zero.
I use 0-val to set the
> >>>>>>>> counter. Actually I testet on Nehalem with
> >>>>>>>> - General Perf-counter #2 (0xc3) with
CPU_CLK_UNHALTED and
> >>>>>>>> val=1100000 
> >>>>>>>> - Fixed counter #1 (0x30a) and val=1100000
> >>>>>>>> The thing is that in normal case the
overflows of both counters
> >>>>>>>> appear nearly at the same time. As
described I added some extra
> >>>>>>>> tracer for xentrace in
core2_vpmu_do_interrupt() so the code
> >>>>>>>> looks like: 
> >>>>>>>> 
> >>>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
msr_content);     -> 1.
> >>>>>>>> 		Step 	{ uint32_t HAHN_l, HAHN_h;
> >>>>>>>> 		HAHN_l = (uint32_t) msr_content;
> >>>>>>>> 		HAHN_h = (uint32_t) (msr_content
>> 32);
> >>>>>>>> 		HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h,
HAHN_l);      -> 2. Step
> >>>>>>>>         }     if ( !msr_content ) return
0;
> >>>>>>>>     core2_vpmu_cxt->global_ovf_status
|= msr_content;
> >>>>>>>>     msr_content = 0xC000000700000000 | ((1
<<
> >>>>>>>>     core2_get_pmc_count()) - 1);
> >>>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL,
msr_content);   -> 3.
> >>>>>>>> Step 
> >>>>>>>> 
> >>>>>>>>     rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS,
msr_content);     -> 4.
> >>>>>>>>         Step 	{ uint32_t HAHN_l, HAHN_h;
> >>>>>>>>         HAHN_l = (uint32_t) msr_content;
> >>>>>>>>         HAHN_h = (uint32_t) (msr_content
>> 32);
> >>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xa,
HAHN_h, HAHN_l);    ->
> >>>>>>>> 5. Step 
> >>>>>>>> 
> >>>>>>>>         rdmsrl(0xc3, msr_content);        
-> 6.
> >>>>>>>>         Step General counter #2 HAHN_l =
(uint32_t)
> >>>>>>>>         msr_content; HAHN_h = (uint32_t)
(msr_content >> 32);
> >>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0xc3,
HAHN_h, HAHN_l);
> >>>>>>>>         rdmsrl(0x30a, msr_content);       
-> 7.
> >>>>>>>>         Step Fixed counter #1 HAHN_l =
(uint32_t) msr_content;
> >>>>>>>>         HAHN_h = (uint32_t) (msr_content
>> 32);
> >>>>>>>>         HVMTRACE_3D(HAHN_TR2, v, 0x30a,
HAHN_h, HAHN_l); 	}
> >>>>>>>> 
> >>>>>>>> With these tracers I got the following
output:
> >>>>>>>> 
> >>>>>>>> Last good NMI:
> >>>>>>>> Both counter cause the NMI. Resetting
works OK.
> >>>>>>>> The counter itself were running further.
> >>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low
=  0x0004 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0000, low
=  0x0000 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low
=  0x03c4 ]
> >>>>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low
=  0x02da ]
> >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>>>> 
> >>>>>>>> NMI from where things goes wrong:
> >>>>>>>> Both counter cause the NMI. Resetting
works NOT correct, only
> >>>>>>>> for the general counter! The general
counter (caused the NMI)
> >>>>>>>> seems to be stopped! 
> >>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low
=  0x0004 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low
=  0x0000 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low
=  0x00ec ]
> >>>>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low
=  0x0000 ]
> >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>>>> 
> >>>>>>>> Wrong NMI:
> >>>>>>>> Only the fixed counter causes the NMI
(which was not resetted
> >>>>>>>> during NMI handling above!) Both counter
seems to be stopped!
> >>>>>>>> 2. Step: par1 = 0x01,  high = 0x0002, low
=  0x0000 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 5. Step: par1 = 0x0a,  high = 0x0002, low
=  0x0000 ]
> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS)
> >>>>>>>> 6. Step: par1 = 0xc3,  high = 0x0000, low
=  0x00ec ]
> >>>>>>>> rdmsrl(0xc3) -> #2 general counter
> >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low
=  0x0000 ]
> >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter
> >>>>>>>> 
> >>>>>>>> And this state remains forever!
> >>>>>>>> I hope my explanations are understandable
;-)
> >>>>>>>> 
> >>>>>>>> Until now I can see this behavior only on
a Nehalem processor.
> >>>>>>>> 
> >>>>>>>> Thanks.
> >>>>>>>> Dietmar
> >>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> Best Regards
> >>>>>>>>> Shan Haitao
> >>>>>>>>> 
> >>>>>>>>> 2009/10/30 Keir Fraser
<keir.fraser@eu.citrix.com>:
> >>>>>>>>>> On 30/10/2009 12:20, "Dietmar
Hahn"
> >>>>>>>>>>
<dietmar.hahn@ts.fujitsu.com> wrote:
> >>>>>>>>>> 
> >>>>>>>>>>> I searched the intel processor
spec but couldn''t find any
> >>>>>>>>>>> help. So my questions is, what
is wrong here?
> >>>>>>>>>>> Can anybody with more
knowledge point me in the right
> >>>>>>>>>>> direction, what can I still do
to find the real cause of
> >>>>>>>>>>> this? 
> >>>>>>>>>> 
> >>>>>>>>>> You should probably Cc one of the
Intel guys who implemented
> >>>>>>>>>> this stuff -- I''ve added
Haitao Shan.
> >>>>>>>>>> 
> >>>>>>>>>> Meanwhile I''d be
interested to know whether things work okay
> >>>>>>>>>> for you, minus performance
counters and the hypervisor hang,
> >>>>>>>>>> if you return immediately from
vpmu_initialise(). Really at
> >>>>>>>>>> minimum we need such a fix,
perhaps with a boot paremeter to
> >>>>>>>>>> re-enable the feature, for 3.4.2
release; allowing guests to
> >>>>>>>>>> hose the hypervisor like this is
of course not on.
> >>>>>>>>>> 
> >>>>>>>>>>  -- Keir
> >>>> _______________________________________________
> >>>> Xen-devel mailing list
> >>>> Xen-devel@lists.xensource.com
> >>>> http://lists.xensource.com/xen-devel
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> -- 
Company details: http://ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Oct 2009 - Need help in debugging partially blocked hypervisor

[Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

RE: [Xen-devel] Need help in debugging partially blocked hypervisor

Re: [Xen-devel] Need help in debugging partially blocked hypervisor