Dietmar Hahn
2009-Oct-21 13:07 UTC
[Xen-devel] Need help in debugging partially blocked hypervisor
Hi, I need some help in debugging a strange hypervisor behavior together with using fully virtualized performance counters. For info I use SLES11, means xen-3.3.1 and linux-2.6.27.19-5... on a Intel nehalem machine. I tried the hypervisor from xen-unstable but the machine didn''t boot. dom0 1 cpu domU 2 cpu''s 3 cpu''s paused. I start performance counter in domU and after some time the domU cpus are running forever (seeing with xm vcpu-list) and the domU is not accessible. dom0 is still working like expected. Serial console doesn''t react on 3xCTRL-A, but xm debug-keys prints it''s output on the serial console. When I try to pause the domU (xm pause ...), using xenctx or some debug keys where the domU must get paused, the dom0 freezes and only a hard reset helps, what seems to come from the call of vcpu_sleep_sync(). I tried xentrace while in the strange state and saw only loggings from the CPU0 (dom0 cpu), what means for me that the domU CPU''s are somewhere in the hypervisor. Attached is the output of "xm debug-keys d". I hope someone has an idea about the direction where I have to look deeper. Many thanks in advance! Dietmar. (XEN) ''d'' pressed -> dumping registers (XEN) *** Dumping CPU0 guest state (d0:v0): *** (XEN) ----[ Xen-3.3.1 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff8020746a>] (XEN) RFLAGS: 0000000000000216 EM: 0 CONTEXT: pv guest (XEN) rax: 0000000000000023 rbx: ffffffff803c7505 rcx: ffffffff8020746a (XEN) rdx: 00007fd955ef2f8a rsi: 00007fd95635dc00 rdi: 00007fd946ff9170 (XEN) rbp: ffffffffffffffda rsp: ffff8800da541dc0 r8: 00007fd956324390 (XEN) r9: 0000000000000002 r10: 0000000000000000 r11: 0000000000000216 (XEN) r12: ffff8800dbd42080 r13: ffff8800db4d5500 r14: 0000000000000000 (XEN) r15: 00007fd946ff9200 cr0: 0000000080050033 cr4: 00000000000026b0 (XEN) cr3: 000000025c880000 cr2: 00007fef4f880ad0 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffff8800da541dc0: (XEN) ffffffff80307263 ffffffff80207460 ffffffff803c7593 ffff8800ce4d7720 (XEN) 00000000da4fc067 ffff8800050a7180 ffffffff8028484c ffff8800ce1399c0 (XEN) 0000000000000003 0000000000000000 ffff8800ce1399c0 00007fd946ff9000 (XEN) 0000000000000000 0000000000000023 00007fd946ff9170 00007fd95635dc00 (XEN) 00007fd955ef2f8a 0000000000000000 00007fd956324390 ffff8800d9bc4780 (XEN) 0000000000000001 00007fd946ff9000 ffffffff803c7505 ffff8800dbd42100 (XEN) ffff8800dbd42080 ffff8800db4d5500 0000000000000000 00007fd946ff9200 (XEN) ffffffff802e0ae3 0030500046ff9000 ffff8800db4d5500 00007fd946ff9200 (XEN) 0000000000305000 0000000000000006 0000000000000006 00007fd956208608 (XEN) ffffffff802aa8b5 ffff8800db4d5500 ffff8800db4d5500 00007fd946ff9200 (XEN) ffffffff802aab22 0000000000001000 ffff8800dbde7520 00007fd946ff9000 (XEN) 0000000000000000 ffff8800db4d5500 00007fd946ff9200 0000000000305000 (XEN) ffffffff802aab82 0000000000000006 0000000100000001 0000000000000000 (XEN) 0000000001ce0b34 0000000001c8eed0 0000000000000006 0000000000000001 (XEN) ffffffff8020b3b8 0000000000000246 0000000000000000 0000000000000200 (XEN) fffffffffffffffd 0000000000000010 ffffffff8020b350 00007fd946ff9200 (XEN) 0000000000305000 0000000000000006 0000000000000010 00007fd95536fb77 (XEN) 000000000000e033 0000000000000246 00007fd946ff9168 000000000000e02b (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) (XEN) *** Dumping CPU1 host state: *** (XEN) ----[ Xen-3.3.1 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 1 (XEN) RIP: e008:[<ffff828c8013a24b>] default_idle+0x2b/0x40 (XEN) RFLAGS: 0000000000000246 CONTEXT: hypervisor (XEN) rax: 0000000000000080 rbx: ffff8300bf5f7f28 rcx: 0000000000000001 (XEN) rdx: ffff828c80276980 rsi: ffff828c8021ad40 rdi: 0000000000002000 (XEN) rbp: ffff8300bf5f7f28 rsp: ffff8300bf5f7f08 r8: 0000000000000002 (XEN) r9: ffff8300be601e00 r10: 0000000000000000 r11: ffff8300be601e10 (XEN) r12: ffff828c80276980 r13: 00000014ef213474 r14: ffff828c8021a160 (XEN) r15: ffff828c8021a100 cr0: 000000008005003b cr4: 00000000000026b0 (XEN) cr3: 00000000be864000 cr2: 00007fd946ff3ed0 (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff8300bf5f7f08: (XEN) ffff828c8013e126 0000000000002000 ffff8300be6fc080 ffff8300be61c080 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000246 0000000000007ff0 (XEN) ffff880080ad1000 ffff8800dd488000 0000000000000000 ffffffff8020730a (XEN) 0000000000000000 0000000000000001 0000000000000002 0000010000000000 (XEN) ffffffff8020730a 000000000000e033 0000000000000246 ffff8800dd489f28 (XEN) 000000000000e02b 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000001 ffff8300be6fc080 (XEN) Xen call trace: (XEN) [<ffff828c8013a24b>] default_idle+0x2b/0x40 (XEN) [<ffff828c8013e126>] idle_loop+0xa6/0xd0 (XEN) (XEN) No guest context (CPU1 is idle). (XEN) (XEN) *** Dumping CPU2 host state: *** (XEN) ----[ Xen-3.3.1 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 2 (XEN) RIP: e008:[<ffff828c8019a45c>] vmx_vmexit_handler+0x2ec/0x1b20 (XEN) RFLAGS: 0000000000000246 CONTEXT: hypervisor (XEN) rax: 0000000000000020 rbx: ffff8300be6e0080 rcx: 0000000000000000 (XEN) rdx: ffff828c8021c3a0 rsi: 00000000000003de rdi: ffff8300be6f7f28 (XEN) rbp: ffff9700ffb80990 rsp: ffff8300be6f7e38 r8: ffff97600036379c (XEN) r9: ffff9700ff428b5b r10: ffff976000363794 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: ffff8300be6e0080 r14: ffff8300be6f7f28 (XEN) r15: ffff976000363958 cr0: 000000008005003b cr4: 00000000000026b0 (XEN) cr3: 000000033fc01000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8300be6f7e38: (XEN) 0000000000000000 ffff8300be6e0080 ffff8300be6e1858 ffff8300be6e0080 (XEN) ffff9700ffb80990 ffff828c80187141 000000000000e102 000000000000e102 (XEN) 00000000000000e1 ffff828c8019483d ffff8300be6ee102 ffff828c80137d6e (XEN) ffff8300be6f7f28 ffff8300be6e0080 0000000000000000 ffff8300be601f08 (XEN) 00000078be6edeea 0000000000000002 ffff8300be6f7f28 ffff828c8011b87a (XEN) ffff828c80276980 0000000000000002 ffff828c80277980 ffff8300be6e0080 (XEN) ffff9700ffb80990 0000000000000000 ffff976000363958 ffffffffffffffff (XEN) ffff976000363958 ffff828c801944c3 ffff976000363958 ffffffffffffffff (XEN) ffff976000363958 0000000000000000 ffff9700ffb80990 0000000000000050 (XEN) 0000000000000000 ffff976000363794 ffff9700ff428b5b ffff97600036379c (XEN) 0000000000000730 ffffb000000b8000 00000000000003de 00000000000003de (XEN) ffff9700ffb80990 000000000000000b ffff9700ff025250 0000000000000000 (XEN) 0000000000010097 ffff976000363938 0000000000000000 5555555555555555 (XEN) 5555555555555555 5555555555555555 5555555555555555 5555555500000002 (XEN) ffff8300be6e0080 (XEN) Xen call trace: (XEN) [<ffff828c8019a45c>] vmx_vmexit_handler+0x2ec/0x1b20 (XEN) [<ffff828c80187141>] hvm_vcpu_has_pending_irq+0x41/0x60 (XEN) [<ffff828c8019483d>] vmx_intr_assist+0x2bd/0x490 (XEN) [<ffff828c80137d6e>] reprogram_timer+0x1e/0x90 (XEN) [<ffff828c8011b87a>] _spin_unlock_irq+0x1a/0x40 (XEN) [<ffff828c801944c3>] vmx_asm_do_vmentry+0x0/0xbd (XEN) (XEN) *** Dumping CPU2 guest state (d1:v1): *** (XEN) ----[ Xen-3.3.1 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 2 (XEN) RIP: 0020:[<ffff9700ff025250>] (XEN) RFLAGS: 0000000000010097 CONTEXT: hvm guest (XEN) rax: 0000000000000730 rbx: 0000000000000050 rcx: ffffb000000b8000 (XEN) rdx: 00000000000003de rsi: 00000000000003de rdi: ffff9700ffb80990 (XEN) rbp: ffff9700ffb80990 rsp: ffff976000363938 r8: ffff97600036379c (XEN) r9: ffff9700ff428b5b r10: ffff976000363794 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: ffff976000363958 r14: ffffffffffffffff (XEN) r15: ffff976000363958 cr0: 0000000080050033 cr4: 00000000000006b0 (XEN) cr3: 0000000001822000 cr2: 0000000000000000 (XEN) ds: 0028 es: 0028 fs: 0028 gs: 0028 ss: 0028 cs: 0020 (XEN) (XEN) *** Dumping CPU3 host state: *** (XEN) ----[ Xen-3.3.1 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 3 (XEN) RIP: e008:[<ffff828c8019a45c>] vmx_vmexit_handler+0x2ec/0x1b20 (XEN) RFLAGS: 0000000000000202 CONTEXT: hypervisor (XEN) rax: 0000000000000027 rbx: ffff8300be6e4080 rcx: 0000000000000007 (XEN) rdx: ffff828c8021e3a0 rsi: ffff9700fe1a9b70 rdi: ffff8300be91ff28 (XEN) rbp: ffff9700ffb80998 rsp: ffff8300be91fe38 r8: 0000000000000000 (XEN) r9: ffff9700ff41e074 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: 0000000000000007 r13: ffff8300be6e4080 r14: ffff8300be91ff28 (XEN) r15: ffff9700ff01f9c0 cr0: 000000008005003b cr4: 00000000000026b0 (XEN) cr3: 000000033fc26000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen stack trace from rsp=ffff8300be91fe38: (XEN) ffff828c8021e100 ffff8300be6e4080 ffff8300be6e5858 ffff8300be6e4080 (XEN) ffff9700ffb80998 ffff828c80187141 000000000000e102 000000000000e102 (XEN) 00000000000000e1 ffff828c8019483d ffff8300be6ee102 ffff828c80137d6e (XEN) ffff8300be91ff28 ffff8300be6e4080 0000000000000000 ffff8300be852088 (XEN) 000001f9889c1558 0000000000000003 ffff8300be91ff28 ffff828c8011b87a (XEN) ffff828c80276980 0000000000000003 ffff828c80277980 ffff8300be6e4080 (XEN) ffff9700ffb80998 ffff9700ff0476fc ffff9700ff047700 ffff9700fe000000 (XEN) ffff9700ff01f9c0 ffff828c801944c3 ffff9700ff01f9c0 ffff9700fe000000 (XEN) ffff9700ff047700 ffff9700ff0476fc ffff9700ffb80998 00000000c0010001 (XEN) 0000000000000000 0000000000000000 ffff9700ff41e074 0000000000000000 (XEN) ffff9700ff02e59a 0000000000000043 0000000000000043 ffff9700fe1a9b70 (XEN) ffff9700ffb80998 000000f100000001 ffff9700ff02e5c9 0000000000000000 (XEN) 0000000000000282 ffff9700fe1a9b60 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000003 (XEN) ffff8300be6e4080 (XEN) Xen call trace: (XEN) [<ffff828c8019a45c>] vmx_vmexit_handler+0x2ec/0x1b20 (XEN) [<ffff828c80187141>] hvm_vcpu_has_pending_irq+0x41/0x60 (XEN) [<ffff828c8019483d>] vmx_intr_assist+0x2bd/0x490 (XEN) [<ffff828c80137d6e>] reprogram_timer+0x1e/0x90 (XEN) [<ffff828c8011b87a>] _spin_unlock_irq+0x1a/0x40 (XEN) [<ffff828c801944c3>] vmx_asm_do_vmentry+0x0/0xbd (XEN) (XEN) *** Dumping CPU3 guest state (d1:v0): *** (XEN) ----[ Xen-3.3.1 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 3 (XEN) RIP: 0020:[<ffff9700ff02e5c9>] (XEN) RFLAGS: 0000000000000282 CONTEXT: hvm guest (XEN) rax: ffff9700ff02e59a rbx: 00000000c0010001 rcx: 0000000000000043 (XEN) rdx: 0000000000000043 rsi: ffff9700fe1a9b70 rdi: ffff9700ffb80998 (XEN) rbp: ffff9700ffb80998 rsp: ffff9700fe1a9b60 r8: 0000000000000000 (XEN) r9: ffff9700ff41e074 r10: 0000000000000000 r11: 0000000000000000 (XEN) r12: ffff9700ff0476fc r13: ffff9700ff047700 r14: ffff9700fe000000 (XEN) r15: ffff9700ff01f9c0 cr0: 0000000080050033 cr4: 00000000000006b0 (XEN) cr3: 0000000001423000 cr2: 0000000000000000 (XEN) ds: 0028 es: 0028 fs: 0028 gs: 0028 ss: 0028 cs: 0020 (XEN) -- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-21 13:28 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
On 21/10/2009 14:07, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> wrote:> I need some help in debugging a strange hypervisor behavior together > with using fully virtualized performance counters. > > For info I use SLES11, means xen-3.3.1 and linux-2.6.27.19-5... on a > Intel nehalem machine. > I tried the hypervisor from xen-unstable but the machine didn''t boot.That in itself is frankly more of a concern to me. Probably recent irq-handling changes, or some other platform change, has broken boot on some machines. If we don''t get reports and testing help with that, it''ll end up broken in the next major stable release too, which we really don''t want. Meanwhile, can you at least boot with 3.4? At least we still maintain that. And do a debug build (debug=y make ...) so that the backtraces from the ''d'' debug key are more meaningful. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Oct-21 13:35 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
> On 21/10/2009 14:07, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> wrote: > > > I need some help in debugging a strange hypervisor behavior together > > with using fully virtualized performance counters. > > > > For info I use SLES11, means xen-3.3.1 and linux-2.6.27.19-5... on a > > Intel nehalem machine. > > I tried the hypervisor from xen-unstable but the machine didn''t boot. > > That in itself is frankly more of a concern to me. Probably recent > irq-handling changes, or some other platform change, has broken boot on some > machines. If we don''t get reports and testing help with that, it''ll end up > broken in the next major stable release too, which we really don''t want. > > Meanwhile, can you at least boot with 3.4? At least we still maintain that. > And do a debug build (debug=y make ...) so that the backtraces from the ''d'' > debug key are more meaningful. > > -- KeirYes, you are right, I''ll try 3.4. Thanks. Dietmar.> > > >-- Dietmar Hahn TSP ES&S SWE OS Telephone: +49 (0) 89 636 40274 Fujitsu Technology Solutions Email: dietmar.hahn@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: http://ts.fujitsu.com D-81739 München Company details:ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-21 13:53 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
On 21/10/2009 14:35, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> wrote:>> That in itself is frankly more of a concern to me. Probably recent >> irq-handling changes, or some other platform change, has broken boot on some >> machines. If we don''t get reports and testing help with that, it''ll end up >> broken in the next major stable release too, which we really don''t want. >> >> Meanwhile, can you at least boot with 3.4? At least we still maintain that. >> And do a debug build (debug=y make ...) so that the backtraces from the ''d'' >> debug key are more meaningful. >> >> -- Keir > > Yes, you are right, I''ll try 3.4.Thanks. DomU guests taking out the host is an embarrassing class of bug. It would be good to get this sorted for 3.4.2 if the bug still exists. Or worst case we could make this perfctr stuff a default-off config option. ;-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Oct-22 06:23 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
Am 21.10.2009 schrieb Keir Keir Fraser:> On 21/10/2009 14:07, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> wrote: > > > I need some help in debugging a strange hypervisor behavior together > > with using fully virtualized performance counters. > > > > For info I use SLES11, means xen-3.3.1 and linux-2.6.27.19-5... on a > > Intel nehalem machine. > > I tried the hypervisor from xen-unstable but the machine didn''t boot. > > That in itself is frankly more of a concern to me. Probably recent > irq-handling changes, or some other platform change, has broken boot on some > machines. If we don''t get reports and testing help with that, it''ll end up > broken in the next major stable release too, which we really don''t want. > > Meanwhile, can you at least boot with 3.4? At least we still maintain that. > And do a debug build (debug=y make ...) so that the backtraces from the ''d'' > debug key are more meaningful. > > -- KeirOK, I tried xen-3.4-testing.hg and the system booted fine ;-) Then I did a fresh hg pull from xen-unstable and the boot stopped in the linux kernel. Attached are the loggings from the serial console for both hypervisors. The tests with the performance counters needs more time for some preparations. Thanks. Dietmar. -- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-22 06:39 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
On 22/10/2009 07:23, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> wrote:> OK, I tried xen-3.4-testing.hg and the system booted fine ;-) > Then I did a fresh hg pull from xen-unstable and the boot stopped in > the linux kernel. > Attached are the loggings from the serial console for both hypervisors.Okay, so output just dies early during dom0 boot. I guess if you try the ''d'' debug key that you get no output from that either (CTRL-a three times followed by d)? Does xen-unstable work on other machines with that dom0 kernel, do you know? It''s not at this point clear whether the issue is related to the hardware or the particular dom0 kernel. If you haven''t seen that dom0 kernel work with xen-unstable on any system, can I get that dom0 kernel from somewhere to give it a go? Perhaps your exact dom0 kernel binary to start with, to make things as close as possibel to your setup? Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Oct-22 07:21 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
> On 22/10/2009 07:23, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> wrote: > > > OK, I tried xen-3.4-testing.hg and the system booted fine ;-) > > Then I did a fresh hg pull from xen-unstable and the boot stopped in > > the linux kernel. > > Attached are the loggings from the serial console for both hypervisors. > > Okay, so output just dies early during dom0 boot. I guess if you try the ''d'' > debug key that you get no output from that either (CTRL-a three times > followed by d)?Sorry, CTRL-a doesn''t work.> > Does xen-unstable work on other machines with that dom0 kernel, do you know? > It''s not at this point clear whether the issue is related to the hardware or > the particular dom0 kernel.Yes it works on older machines, I can send you the log.> > If you haven''t seen that dom0 kernel work with xen-unstable on any system, > can I get that dom0 kernel from somewhere to give it a go? Perhaps your > exact dom0 kernel binary to start with, to make things as close as possibel > to your setup?If needed I can put the kernel on an outgoing ftp server. Dietmar. -- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Oct-30 12:20 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
Hi,> I need some help in debugging a strange hypervisor behavior together > with using fully virtualized performance counters. >I added some own tracer to xentrace to find, what the CPU is doing. No I can see, that in the strange case the CPU is doing endless (and nothing else!) performance counter NMI''s within the hypervisor. pmu_apic_interrupt smp_pmu_apic_interrupt vmx_do_pmu_interrupt vpmu_do_interrupt In the normal case in core2_vpmu_do_interrupt: 1. Read the cause of the nmi rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); ... 2. Save the value for the domU ... 3. Reset the cause wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); 4. Inject NMI in domU This works very well for a short time. Then the hypervisor falls in the endless nmi loop. The cause for this seems to be that "3. Reset the cause" doesn''t work anymore. Means writing to the MSR_CORE_PERF_GLOBAL_OVF_CTRL doesn''t reset the MSR_CORE_PERF_GLOBAL_STATUS which leads to the next nmi immediately. I found this by adding another tracer which reads the MSR_CORE_PERF_GLOBAL_STATUS once again after writing the MSR_CORE_PERF_GLOBAL_OVF_CTRL. In the normal case this contains now 0, in the strange case the value is unchanged! I searched the intel processor spec but couldn''t find any help. So my questions is, what is wrong here? Can anybody with more knowledge point me in the right direction, what can I still do to find the real cause of this? Many thanks in advance! Dietmar. -- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-30 13:06 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> wrote:> I searched the intel processor spec but couldn''t find any help. > So my questions is, what is wrong here? > Can anybody with more knowledge point me in the right direction, what can I > still > do to find the real cause of this?You should probably Cc one of the Intel guys who implemented this stuff -- I''ve added Haitao Shan. Meanwhile I''d be interested to know whether things work okay for you, minus performance counters and the hypervisor hang, if you return immediately from vpmu_initialise(). Really at minimum we need such a fix, perhaps with a boot paremeter to re-enable the feature, for 3.4.2 release; allowing guests to hose the hypervisor like this is of course not on. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Haitao Shan
2009-Nov-02 01:12 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
Can I know how you enabled vPMU on Nehalem? This is not supported in current Xen. Concerning vpmu support, I totally agree that we can disable this feature by default. If anyone really wants to use it, he can use boot options to turn it on. I am preparing a patch for that. And I will send a patch to enable NHM vpmu together. For the problem that Dietmar met, I think I once met this before. Can you add some code in vpmu_do_interrupt that sets the counter you are using to a value other than zero? Please let me know if that can help. Best Regards Shan Haitao 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>:> On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> wrote: > >> I searched the intel processor spec but couldn''t find any help. >> So my questions is, what is wrong here? >> Can anybody with more knowledge point me in the right direction, what can I >> still >> do to find the real cause of this? > > You should probably Cc one of the Intel guys who implemented this stuff -- > I''ve added Haitao Shan. > > Meanwhile I''d be interested to know whether things work okay for you, minus > performance counters and the hypervisor hang, if you return immediately from > vpmu_initialise(). Really at minimum we need such a fix, perhaps with a boot > paremeter to re-enable the feature, for 3.4.2 release; allowing guests to > hose the hypervisor like this is of course not on. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Nov-02 09:11 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
Hi Haitao,> Can I know how you enabled vPMU on Nehalem? This is not supported in > current Xen.http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html> > Concerning vpmu support, I totally agree that we can disable this > feature by default. If anyone really wants to use it, he can use boot > options to turn it on.Yes, that''s OK for me.> I am preparing a patch for that. And I will > send a patch to enable NHM vpmu together. > > For the problem that Dietmar met, I think I once met this before. Can > you add some code in vpmu_do_interrupt that sets the counter you are > using to a value other than zero? Please let me know if that can help.I don''t set the counter to zero. I use 0-val to set the counter. Actually I testet on Nehalem with - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000 - Fixed counter #1 (0x30a) and val=1100000 The thing is that in normal case the overflows of both counters appear nearly at the same time. As described I added some extra tracer for xentrace in core2_vpmu_do_interrupt() so the code looks like: rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. Step { uint32_t HAHN_l, HAHN_h; HAHN_l = (uint32_t) msr_content; HAHN_h = (uint32_t) (msr_content >> 32); HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step } if ( !msr_content ) return 0; core2_vpmu_cxt->global_ovf_status |= msr_content; msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. Step rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. Step { uint32_t HAHN_l, HAHN_h; HAHN_l = (uint32_t) msr_content; HAHN_h = (uint32_t) (msr_content >> 32); HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. Step rdmsrl(0xc3, msr_content); -> 6. Step General counter #2 HAHN_l = (uint32_t) msr_content; HAHN_h = (uint32_t) (msr_content >> 32); HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); rdmsrl(0x30a, msr_content); -> 7. Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; HAHN_h = (uint32_t) (msr_content >> 32); HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } With these tracers I got the following output: Last good NMI: Both counter cause the NMI. Resetting works OK. The counter itself were running further. 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] rdmsrl(0xc3) -> #2 general counter 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] rdmsrl(0x30a) -> #1 fixed counter NMI from where things goes wrong: Both counter cause the NMI. Resetting works NOT correct, only for the general counter! The general counter (caused the NMI) seems to be stopped! 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) -> #2 general counter 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) -> #1 fixed counter Wrong NMI: Only the fixed counter causes the NMI (which was not resetted during NMI handling above!) Both counter seems to be stopped! 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) -> #2 general counter 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) -> #1 fixed counter And this state remains forever! I hope my explanations are understandable ;-) Until now I can see this behavior only on a Nehalem processor. Thanks. Dietmar> > Best Regards > Shan Haitao > > 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: > > On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> wrote: > > > >> I searched the intel processor spec but couldn''t find any help. > >> So my questions is, what is wrong here? > >> Can anybody with more knowledge point me in the right direction, what can I > >> still > >> do to find the real cause of this? > > > > You should probably Cc one of the Intel guys who implemented this stuff -- > > I''ve added Haitao Shan. > > > > Meanwhile I''d be interested to know whether things work okay for you, minus > > performance counters and the hypervisor hang, if you return immediately from > > vpmu_initialise(). Really at minimum we need such a fix, perhaps with a boot > > paremeter to re-enable the feature, for 3.4.2 release; allowing guests to > > hose the hypervisor like this is of course not on. > > > > -- Keir > >-- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shan, Haitao
2009-Nov-02 09:49 UTC
RE: [Xen-devel] Need help in debugging partially blocked hypervisor
Very detailed explanation indeed. What you described is the same as I saw months ago. But unluckily, I do not know the root cause yet. It seems to me that unmasking of PMI in local APIC will immediately generate a new NMI in the system if one of the enabled counter is zero at that time. That is why I was asking you whether you could try to set that counter to some value other than zero (for example, 0x1) before unmasking(in your case, it is Fixed Counter 1 0x30a) PMI in vpmu_do_interrupt and see whether it helped. When I met this problem, I remember that I tried two approaches: 1> Setting the counter to non-zero before unmasking PMI in vpmu_do_interrupt; 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical PMI* when guest vcpu unmasks virtual PMI. I remember that approach 2 can fix this issue. But I do not remember the result of approach 1, since I met this about one year ago. It is my understanding that approach 2 is quite same as approach 1, since normally guest will set the counter to some negative value (for example, -100000) before unmasking virtual PMI. However, approach 2 looks cleaner and more reasonable. Can you have a try and let me know the result? If both can not work, there might be some problems that I have not met before. BTW: Sorry, I did not see your patch to enable NHM vpmu before. So, there is no need for me to work on that now. :) Haitao Dietmar Hahn wrote:> Hi Haitao, > >> Can I know how you enabled vPMU on Nehalem? This is not supported in >> current Xen. > > http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html > >> >> Concerning vpmu support, I totally agree that we can disable this >> feature by default. If anyone really wants to use it, he can use boot >> options to turn it on. > > Yes, that''s OK for me. > >> I am preparing a patch for that. And I will >> send a patch to enable NHM vpmu together. >> >> For the problem that Dietmar met, I think I once met this before. Can >> you add some code in vpmu_do_interrupt that sets the counter you are >> using to a value other than zero? Please let me know if that can >> help. > > I don''t set the counter to zero. I use 0-val to set the counter. > Actually I testet on Nehalem with > - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000 > - Fixed counter #1 (0x30a) and val=1100000 > The thing is that in normal case the overflows of both counters appear > nearly at the same time. > As described I added some extra tracer for xentrace in > core2_vpmu_do_interrupt() so the code looks like: > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. Step > { > uint32_t HAHN_l, HAHN_h; > HAHN_l = (uint32_t) msr_content; > HAHN_h = (uint32_t) (msr_content >> 32); > HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step > } > if ( !msr_content ) > return 0; > core2_vpmu_cxt->global_ovf_status |= msr_content; > msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) > - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. > Step > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. Step > { > uint32_t HAHN_l, HAHN_h; > HAHN_l = (uint32_t) msr_content; > HAHN_h = (uint32_t) (msr_content >> 32); > HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. Step > > rdmsrl(0xc3, msr_content); -> 6. Step > General counter #2 HAHN_l = (uint32_t) msr_content; > HAHN_h = (uint32_t) (msr_content >> 32); > HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); > rdmsrl(0x30a, msr_content); -> 7. Step > Fixed counter #1 HAHN_l = (uint32_t) msr_content; > HAHN_h = (uint32_t) (msr_content >> 32); > HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); > } > > With these tracers I got the following output: > > Last good NMI: > Both counter cause the NMI. Resetting works OK. > The counter itself were running further. > 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] rdmsrl(0xc3) > -> #2 general counter > 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] rdmsrl(0x30a) > -> #1 fixed counter > > NMI from where things goes wrong: > Both counter cause the NMI. Resetting works NOT correct, only for the > general counter! > The general counter (caused the NMI) seems to be stopped! > 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) > -> #2 general counter > 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) > -> #1 fixed counter > > Wrong NMI: > Only the fixed counter causes the NMI (which was not resetted during > NMI handling above!) Both counter seems to be stopped! > 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) > -> #2 general counter > 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) > -> #1 fixed counter > > And this state remains forever! > I hope my explanations are understandable ;-) > > Until now I can see this behavior only on a Nehalem processor. > > Thanks. > Dietmar > >> >> Best Regards >> Shan Haitao >> >> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: >>> On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> >>> wrote: >>> >>>> I searched the intel processor spec but couldn''t find any help. >>>> So my questions is, what is wrong here? >>>> Can anybody with more knowledge point me in the right direction, >>>> what can I still do to find the real cause of this? >>> >>> You should probably Cc one of the Intel guys who implemented this >>> stuff -- I''ve added Haitao Shan. >>> >>> Meanwhile I''d be interested to know whether things work okay for >>> you, minus performance counters and the hypervisor hang, if you >>> return immediately from vpmu_initialise(). Really at minimum we >>> need such a fix, perhaps with a boot paremeter to re-enable the >>> feature, for 3.4.2 release; allowing guests to hose the hypervisor >>> like this is of course not on. >>> >>> -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Nov-02 10:30 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
> Very detailed explanation indeed. What you described is the same as I saw months ago. > But unluckily, I do not know the root cause yet. It seems to me that unmasking of PMI in local APIC will immediately generate a new NMI in the system if one of the enabled counter is zero at that time. > That is why I was asking you whether you could try to set that counter to some value other than zero (for example, 0x1) before unmasking(in your case, it is Fixed Counter 1 0x30a) PMI in vpmu_do_interrupt and see whether it helped.OK I will try to set the counter after reading the 0 value to 1. But some things remain fully unclear ... Dietmar.> > When I met this problem, I remember that I tried two approaches: > 1> Setting the counter to non-zero before unmasking PMI in vpmu_do_interrupt; > 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical PMI* when guest vcpu unmasks virtual PMI. > I remember that approach 2 can fix this issue. But I do not remember the result of approach 1, since I met this about one year ago. > It is my understanding that approach 2 is quite same as approach 1, since normally guest will set the counter to some negative value (for example, -100000) before unmasking virtual PMI. > However, approach 2 looks cleaner and more reasonable. > > Can you have a try and let me know the result? If both can not work, there might be some problems that I have not met before. > > BTW: Sorry, I did not see your patch to enable NHM vpmu before. So, there is no need for me to work on that now. :) > > Haitao > > > Dietmar Hahn wrote: > > Hi Haitao, > > > >> Can I know how you enabled vPMU on Nehalem? This is not supported in > >> current Xen. > > > > http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html > > > >> > >> Concerning vpmu support, I totally agree that we can disable this > >> feature by default. If anyone really wants to use it, he can use boot > >> options to turn it on. > > > > Yes, that''s OK for me. > > > >> I am preparing a patch for that. And I will > >> send a patch to enable NHM vpmu together. > >> > >> For the problem that Dietmar met, I think I once met this before. Can > >> you add some code in vpmu_do_interrupt that sets the counter you are > >> using to a value other than zero? Please let me know if that can > >> help. > > > > I don''t set the counter to zero. I use 0-val to set the counter. > > Actually I testet on Nehalem with > > - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000 > > - Fixed counter #1 (0x30a) and val=1100000 > > The thing is that in normal case the overflows of both counters appear > > nearly at the same time. > > As described I added some extra tracer for xentrace in > > core2_vpmu_do_interrupt() so the code looks like: > > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. Step > > { > > uint32_t HAHN_l, HAHN_h; > > HAHN_l = (uint32_t) msr_content; > > HAHN_h = (uint32_t) (msr_content >> 32); > > HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step > > } > > if ( !msr_content ) > > return 0; > > core2_vpmu_cxt->global_ovf_status |= msr_content; > > msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) > > - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. > > Step > > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. Step > > { > > uint32_t HAHN_l, HAHN_h; > > HAHN_l = (uint32_t) msr_content; > > HAHN_h = (uint32_t) (msr_content >> 32); > > HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. Step > > > > rdmsrl(0xc3, msr_content); -> 6. Step > > General counter #2 HAHN_l = (uint32_t) msr_content; > > HAHN_h = (uint32_t) (msr_content >> 32); > > HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); > > rdmsrl(0x30a, msr_content); -> 7. Step > > Fixed counter #1 HAHN_l = (uint32_t) msr_content; > > HAHN_h = (uint32_t) (msr_content >> 32); > > HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); > > } > > > > With these tracers I got the following output: > > > > Last good NMI: > > Both counter cause the NMI. Resetting works OK. > > The counter itself were running further. > > 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] rdmsrl(0xc3) > > -> #2 general counter > > 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] rdmsrl(0x30a) > > -> #1 fixed counter > > > > NMI from where things goes wrong: > > Both counter cause the NMI. Resetting works NOT correct, only for the > > general counter! > > The general counter (caused the NMI) seems to be stopped! > > 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) > > -> #2 general counter > > 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) > > -> #1 fixed counter > > > > Wrong NMI: > > Only the fixed counter causes the NMI (which was not resetted during > > NMI handling above!) Both counter seems to be stopped! > > 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) > > -> #2 general counter > > 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) > > -> #1 fixed counter > > > > And this state remains forever! > > I hope my explanations are understandable ;-) > > > > Until now I can see this behavior only on a Nehalem processor. > > > > Thanks. > > Dietmar > > > >> > >> Best Regards > >> Shan Haitao > >> > >> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: > >>> On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> > >>> wrote: > >>> > >>>> I searched the intel processor spec but couldn''t find any help. > >>>> So my questions is, what is wrong here? > >>>> Can anybody with more knowledge point me in the right direction, > >>>> what can I still do to find the real cause of this? > >>> > >>> You should probably Cc one of the Intel guys who implemented this > >>> stuff -- I''ve added Haitao Shan. > >>> > >>> Meanwhile I''d be interested to know whether things work okay for > >>> you, minus performance counters and the hypervisor hang, if you > >>> return immediately from vpmu_initialise(). Really at minimum we > >>> need such a fix, perhaps with a boot paremeter to re-enable the > >>> feature, for 3.4.2 release; allowing guests to hose the hypervisor > >>> like this is of course not on. > >>> > >>> -- Keir >-- Dietmar Hahn TSP ES&S SWE OS Telephone: +49 (0) 89 636 40274 Fujitsu Technology Solutions Email: dietmar.hahn@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: http://ts.fujitsu.com D-81739 München Company details:ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Nov-03 06:53 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
> > > Very detailed explanation indeed. What you described is the same as I saw months ago. > > But unluckily, I do not know the root cause yet. It seems to me that unmasking of PMI in local APIC will immediately generate a new NMI in the system if one of the enabled counter is zero at that time. > > That is why I was asking you whether you could try to set that counter to some value other than zero (for example, 0x1) before unmasking(in your case, it is Fixed Counter 1 0x30a) PMI in vpmu_do_interrupt and see whether it helped. > > OK I will try to set the counter after reading the 0 value to 1. > But some things remain fully unclear ...Hi Haitao,> 1> Setting the counter to non-zero before unmasking PMI in vpmu_do_interrupt;I tried your first approach. 1. I added rdmsrl(CounterX, msr_content) if (msr_content == 0) { HVMTRACE_3D(HAHN_TR2, ...); // A tracer to see this. wrmsrl(ConterX, 0x1) } directly behind the line of reading the MSR_CORE_PERF_GLOBAL_STATUS. In the xentrace output I found some tracers where counters were zero but I couldn''t reproduce the hanging behavior! The interesting thing here was, that MSR_CORE_PERF_GLOBAL_STATUS contained always zero (4. Step) after resetting it with writing MSR_CORE_PERF_GLOBAL_OVF_CTRL (3. Step). This was differently seen in my first mail! 2. I added the code above behind the second read (for test) of MSR_CORE_PERF_GLOBAL_STATUS (around 6. and 7. Step). Now I could see some of these tracers but no hanging behavior! In this case I could see the same behavior of the MSR_CORE_PERF_GLOBAL_STATUS like in my first mail. The conclusion is, that this seems to be a workaround for the endless NMI loop. PMI''s are a very rarely event and this should not raise a performance problem. I didn''t try your second approach> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical PMI* when guest vcpu unmasks virtual PMI.but I have some question. - What if the ''physical PMI'' is not unmasked in vpmu_do_interrupt and a watchdog NMI would occur before the domU unmasks it? - Is it possible that after handling the NMI (and not unmasking) another domU got running on this CPU and therefore PMI''s got lost? But the real cause of the problem is unknown. As said I saw this only on Nehalem. Maybe there is a problem together with the hardware? Perhaps your hardware colleagues know something more ;-) Thanks Dietmar> > > > > When I met this problem, I remember that I tried two approaches: > > 1> Setting the counter to non-zero before unmasking PMI in vpmu_do_interrupt; > > 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical PMI* when guest vcpu unmasks virtual PMI. > > I remember that approach 2 can fix this issue. But I do not remember the result of approach 1, since I met this about one year ago. > > It is my understanding that approach 2 is quite same as approach 1, since normally guest will set the counter to some negative value (for example, -100000) before unmasking virtual PMI. > > However, approach 2 looks cleaner and more reasonable. > > > > Can you have a try and let me know the result? If both can not work, there might be some problems that I have not met before. > > > > BTW: Sorry, I did not see your patch to enable NHM vpmu before. So, there is no need for me to work on that now. :) > > > > Haitao > > > > > > Dietmar Hahn wrote: > > > Hi Haitao, > > > > > >> Can I know how you enabled vPMU on Nehalem? This is not supported in > > >> current Xen. > > > > > > http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html > > > > > >> > > >> Concerning vpmu support, I totally agree that we can disable this > > >> feature by default. If anyone really wants to use it, he can use boot > > >> options to turn it on. > > > > > > Yes, that''s OK for me. > > > > > >> I am preparing a patch for that. And I will > > >> send a patch to enable NHM vpmu together. > > >> > > >> For the problem that Dietmar met, I think I once met this before. Can > > >> you add some code in vpmu_do_interrupt that sets the counter you are > > >> using to a value other than zero? Please let me know if that can > > >> help. > > > > > > I don''t set the counter to zero. I use 0-val to set the counter. > > > Actually I testet on Nehalem with > > > - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and val=1100000 > > > - Fixed counter #1 (0x30a) and val=1100000 > > > The thing is that in normal case the overflows of both counters appear > > > nearly at the same time. > > > As described I added some extra tracer for xentrace in > > > core2_vpmu_do_interrupt() so the code looks like: > > > > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. Step > > > { > > > uint32_t HAHN_l, HAHN_h; > > > HAHN_l = (uint32_t) msr_content; > > > HAHN_h = (uint32_t) (msr_content >> 32); > > > HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step > > > } > > > if ( !msr_content ) > > > return 0; > > > core2_vpmu_cxt->global_ovf_status |= msr_content; > > > msr_content = 0xC000000700000000 | ((1 << core2_get_pmc_count()) > > > - 1); wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. > > > Step > > > > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. Step > > > { > > > uint32_t HAHN_l, HAHN_h; > > > HAHN_l = (uint32_t) msr_content; > > > HAHN_h = (uint32_t) (msr_content >> 32); > > > HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. Step > > > > > > rdmsrl(0xc3, msr_content); -> 6. Step > > > General counter #2 HAHN_l = (uint32_t) msr_content; > > > HAHN_h = (uint32_t) (msr_content >> 32); > > > HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); > > > rdmsrl(0x30a, msr_content); -> 7. Step > > > Fixed counter #1 HAHN_l = (uint32_t) msr_content; > > > HAHN_h = (uint32_t) (msr_content >> 32); > > > HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); > > > } > > > > > > With these tracers I got the following output: > > > > > > Last good NMI: > > > Both counter cause the NMI. Resetting works OK. > > > The counter itself were running further. > > > 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > > 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > > 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] rdmsrl(0xc3) > > > -> #2 general counter > > > 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] rdmsrl(0x30a) > > > -> #1 fixed counter > > > > > > NMI from where things goes wrong: > > > Both counter cause the NMI. Resetting works NOT correct, only for the > > > general counter! > > > The general counter (caused the NMI) seems to be stopped! > > > 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > > 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > > 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) > > > -> #2 general counter > > > 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) > > > -> #1 fixed counter > > > > > > Wrong NMI: > > > Only the fixed counter causes the NMI (which was not resetted during > > > NMI handling above!) Both counter seems to be stopped! > > > 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > > 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > > > rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > > > 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] rdmsrl(0xc3) > > > -> #2 general counter > > > 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] rdmsrl(0x30a) > > > -> #1 fixed counter > > > > > > And this state remains forever! > > > I hope my explanations are understandable ;-) > > > > > > Until now I can see this behavior only on a Nehalem processor. > > > > > > Thanks. > > > Dietmar > > > > > >> > > >> Best Regards > > >> Shan Haitao > > >> > > >> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: > > >>> On 30/10/2009 12:20, "Dietmar Hahn" <dietmar.hahn@ts.fujitsu.com> > > >>> wrote: > > >>> > > >>>> I searched the intel processor spec but couldn''t find any help. > > >>>> So my questions is, what is wrong here? > > >>>> Can anybody with more knowledge point me in the right direction, > > >>>> what can I still do to find the real cause of this? > > >>> > > >>> You should probably Cc one of the Intel guys who implemented this > > >>> stuff -- I''ve added Haitao Shan. > > >>> > > >>> Meanwhile I''d be interested to know whether things work okay for > > >>> you, minus performance counters and the hypervisor hang, if you > > >>> return immediately from vpmu_initialise(). Really at minimum we > > >>> need such a fix, perhaps with a boot paremeter to re-enable the > > >>> feature, for 3.4.2 release; allowing guests to hose the hypervisor > > >>> like this is of course not on. > > >>> > > >>> -- Keir > > >-- Dietmar Hahn TSP ES&S SWE OS Telephone: +49 (0) 89 636 40274 Fujitsu Technology Solutions Email: dietmar.hahn@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: http://ts.fujitsu.com D-81739 München Company details:ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shan, Haitao
2009-Nov-03 07:32 UTC
RE: [Xen-devel] Need help in debugging partially blocked hypervisor
See my comments embedded. :) Haitao Dietmar Hahn wrote:> The conclusion is, that this seems to be a workaround for the endless > NMI loop. PMI''s are a very rarely event and this should not raise a > performance > problem.I totally agree that this is only a workaround for approach 1.> > I didn''t try your second approach >> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical >> PMI* when guest vcpu unmasks virtual PMI. but I have some question. > > - What if the ''physical PMI'' is not unmasked in vpmu_do_interrupt and > a watchdog NMI would occur before the domU unmasks it?I think the second NMI will be lost.> - Is it possible that after handling the NMI (and not unmasking) > another domU got running on this CPU and therefore PMI''s got lost?LVTPC entry in physical local APIC is save/restored by Xen on VCPU switches. So unmasking (or not) of PMI of one vcpu should have no impact on another vcpu. When developing vPMU, I treated as vPMU context both PMU MSRs and LVTPC entry in local APIC. vPMU context is save/restored on physical HW when vcpus is scheduled, either in an active save/restore manner or a lazy one (depending on the PMU usage at the time of switch).> > But the real cause of the problem is unknown. As said I saw this only > on > Nehalem. Maybe there is a problem together with the hardware? Perhaps > your > hardware colleagues know something more ;-)When I found this problem, I just thought it might be a corner case that only happens on my box (of course, I only see this in NHM, too). I will try to pin HW guy to see if any explanation, since it is proven to be a general problem on NHM. But before everything is clear, I think approach 2 is a better solution now.> > Thanks > Dietmar > >> >>> >>> When I met this problem, I remember that I tried two approaches: >>> 1> Setting the counter to non-zero before unmasking PMI in >>> vpmu_do_interrupt; 2> Remove unmasking PMI from vpmu_do_interrupt >>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI. >>> I remember that approach 2 can fix this issue. But I do not >>> remember the result of approach 1, since I met this about one year >>> ago. >>> It is my understanding that approach 2 is quite same as approach 1, >>> since normally guest will set the counter to some negative value >>> (for example, -100000) before unmasking virtual PMI. >>> However, approach 2 looks cleaner and more reasonable. >>> >>> Can you have a try and let me know the result? If both can not >>> work, there might be some problems that I have not met before. >>> >>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. So, >>> there is no need for me to work on that now. :) >>> >>> Haitao >>> >>> >>> Dietmar Hahn wrote: >>>> Hi Haitao, >>>> >>>>> Can I know how you enabled vPMU on Nehalem? This is not supported >>>>> in current Xen. >>>> >>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html >>>> >>>>> >>>>> Concerning vpmu support, I totally agree that we can disable this >>>>> feature by default. If anyone really wants to use it, he can use >>>>> boot options to turn it on. >>>> >>>> Yes, that''s OK for me. >>>> >>>>> I am preparing a patch for that. And I will >>>>> send a patch to enable NHM vpmu together. >>>>> >>>>> For the problem that Dietmar met, I think I once met this before. >>>>> Can you add some code in vpmu_do_interrupt that sets the counter >>>>> you are using to a value other than zero? Please let me know if >>>>> that can help. >>>> >>>> I don''t set the counter to zero. I use 0-val to set the counter. >>>> Actually I testet on Nehalem with >>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and >>>> val=1100000 >>>> - Fixed counter #1 (0x30a) and val=1100000 >>>> The thing is that in normal case the overflows of both counters >>>> appear nearly at the same time. As described I added some extra >>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code looks >>>> like: >>>> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. >>>> Step { uint32_t HAHN_l, HAHN_h; >>>> HAHN_l = (uint32_t) msr_content; >>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>> HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step } >>>> if ( !msr_content ) >>>> return 0; >>>> core2_vpmu_cxt->global_ovf_status |= msr_content; >>>> msr_content = 0xC000000700000000 | ((1 << >>>> core2_get_pmc_count()) - 1); >>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. Step >>>> >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. >>>> Step { uint32_t HAHN_l, HAHN_h; >>>> HAHN_l = (uint32_t) msr_content; >>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>> HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. >>>> Step >>>> >>>> rdmsrl(0xc3, msr_content); -> 6. >>>> Step General counter #2 HAHN_l = (uint32_t) msr_content; >>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>> HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); >>>> rdmsrl(0x30a, msr_content); -> 7. >>>> Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; >>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>> HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } >>>> >>>> With these tracers I got the following output: >>>> >>>> Last good NMI: >>>> Both counter cause the NMI. Resetting works OK. >>>> The counter itself were running further. >>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>> 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] >>>> rdmsrl(0xc3) -> #2 general counter >>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] >>>> rdmsrl(0x30a) -> #1 fixed counter >>>> >>>> NMI from where things goes wrong: >>>> Both counter cause the NMI. Resetting works NOT correct, only for >>>> the general counter! The general counter (caused the NMI) seems to >>>> be stopped! >>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>> rdmsrl(0xc3) -> #2 general counter >>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>> rdmsrl(0x30a) -> #1 fixed counter >>>> >>>> Wrong NMI: >>>> Only the fixed counter causes the NMI (which was not resetted >>>> during NMI handling above!) Both counter seems to be stopped! >>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>> rdmsrl(0xc3) -> #2 general counter >>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>> rdmsrl(0x30a) -> #1 fixed counter >>>> >>>> And this state remains forever! >>>> I hope my explanations are understandable ;-) >>>> >>>> Until now I can see this behavior only on a Nehalem processor. >>>> >>>> Thanks. >>>> Dietmar >>>> >>>>> >>>>> Best Regards >>>>> Shan Haitao >>>>> >>>>> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: >>>>>> On 30/10/2009 12:20, "Dietmar Hahn" >>>>>> <dietmar.hahn@ts.fujitsu.com> wrote: >>>>>> >>>>>>> I searched the intel processor spec but couldn''t find any help. >>>>>>> So my questions is, what is wrong here? >>>>>>> Can anybody with more knowledge point me in the right direction, >>>>>>> what can I still do to find the real cause of this? >>>>>> >>>>>> You should probably Cc one of the Intel guys who implemented this >>>>>> stuff -- I''ve added Haitao Shan. >>>>>> >>>>>> Meanwhile I''d be interested to know whether things work okay for >>>>>> you, minus performance counters and the hypervisor hang, if you >>>>>> return immediately from vpmu_initialise(). Really at minimum we >>>>>> need such a fix, perhaps with a boot paremeter to re-enable the >>>>>> feature, for 3.4.2 release; allowing guests to hose the >>>>>> hypervisor like this is of course not on. >>>>>> >>>>>> -- Keir_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Nov-03 07:52 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
Please see below.> See my comments embedded. :) > > Haitao > > > Dietmar Hahn wrote: > > The conclusion is, that this seems to be a workaround for the endless > > NMI loop. PMI''s are a very rarely event and this should not raise a > > performance > > problem. > I totally agree that this is only a workaround for approach 1. > > > > > I didn''t try your second approach > >> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical > >> PMI* when guest vcpu unmasks virtual PMI. but I have some question. > > > > - What if the ''physical PMI'' is not unmasked in vpmu_do_interrupt and > > a watchdog NMI would occur before the domU unmasks it? > I think the second NMI will be lost. > > > - Is it possible that after handling the NMI (and not unmasking) > > another domU got running on this CPU and therefore PMI''s got lost? > LVTPC entry in physical local APIC is save/restored by Xen on VCPU switches. So unmasking (or not) of PMI of one vcpu should have no impact on another vcpu. When developing vPMU, I treated as vPMU context both PMU MSRs and LVTPC entry in local APIC. vPMU context is save/restored on physical HW when vcpus is scheduled, either in an active save/restore manner or a lazy one (depending on the PMU usage at the time of switch). > > > > > But the real cause of the problem is unknown. As said I saw this only > > on > > Nehalem. Maybe there is a problem together with the hardware? Perhaps > > your > > hardware colleagues know something more ;-) > When I found this problem, I just thought it might be a corner case that only happens on my box (of course, I only see this in NHM, too). > I will try to pin HW guy to see if any explanation, since it is proven to be a general problem on NHM. > > But before everything is clear, I think approach 2 is a better solution now.What would be the effect if the guest unmasks the PMI (which leads to unmasking the ''physical PMI'') but doesn''t reset the counter to a value != 0? Is the guest able to produce the nmi endless loop? Dietmar.> > > > > Thanks > > Dietmar > > > >> > >>> > >>> When I met this problem, I remember that I tried two approaches: > >>> 1> Setting the counter to non-zero before unmasking PMI in > >>> vpmu_do_interrupt; 2> Remove unmasking PMI from vpmu_do_interrupt > >>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI. > >>> I remember that approach 2 can fix this issue. But I do not > >>> remember the result of approach 1, since I met this about one year > >>> ago. > >>> It is my understanding that approach 2 is quite same as approach 1, > >>> since normally guest will set the counter to some negative value > >>> (for example, -100000) before unmasking virtual PMI. > >>> However, approach 2 looks cleaner and more reasonable. > >>> > >>> Can you have a try and let me know the result? If both can not > >>> work, there might be some problems that I have not met before. > >>> > >>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. So, > >>> there is no need for me to work on that now. :) > >>> > >>> Haitao > >>> > >>> > >>> Dietmar Hahn wrote: > >>>> Hi Haitao, > >>>> > >>>>> Can I know how you enabled vPMU on Nehalem? This is not supported > >>>>> in current Xen. > >>>> > >>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html > >>>> > >>>>> > >>>>> Concerning vpmu support, I totally agree that we can disable this > >>>>> feature by default. If anyone really wants to use it, he can use > >>>>> boot options to turn it on. > >>>> > >>>> Yes, that''s OK for me. > >>>> > >>>>> I am preparing a patch for that. And I will > >>>>> send a patch to enable NHM vpmu together. > >>>>> > >>>>> For the problem that Dietmar met, I think I once met this before. > >>>>> Can you add some code in vpmu_do_interrupt that sets the counter > >>>>> you are using to a value other than zero? Please let me know if > >>>>> that can help. > >>>> > >>>> I don''t set the counter to zero. I use 0-val to set the counter. > >>>> Actually I testet on Nehalem with > >>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and > >>>> val=1100000 > >>>> - Fixed counter #1 (0x30a) and val=1100000 > >>>> The thing is that in normal case the overflows of both counters > >>>> appear nearly at the same time. As described I added some extra > >>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code looks > >>>> like: > >>>> > >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. > >>>> Step { uint32_t HAHN_l, HAHN_h; > >>>> HAHN_l = (uint32_t) msr_content; > >>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>> HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step } > >>>> if ( !msr_content ) > >>>> return 0; > >>>> core2_vpmu_cxt->global_ovf_status |= msr_content; > >>>> msr_content = 0xC000000700000000 | ((1 << > >>>> core2_get_pmc_count()) - 1); > >>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. Step > >>>> > >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. > >>>> Step { uint32_t HAHN_l, HAHN_h; > >>>> HAHN_l = (uint32_t) msr_content; > >>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>> HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. > >>>> Step > >>>> > >>>> rdmsrl(0xc3, msr_content); -> 6. > >>>> Step General counter #2 HAHN_l = (uint32_t) msr_content; > >>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>> HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); > >>>> rdmsrl(0x30a, msr_content); -> 7. > >>>> Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; > >>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>> HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } > >>>> > >>>> With these tracers I got the following output: > >>>> > >>>> Last good NMI: > >>>> Both counter cause the NMI. Resetting works OK. > >>>> The counter itself were running further. > >>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>> 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] > >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] > >>>> rdmsrl(0xc3) -> #2 general counter > >>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] > >>>> rdmsrl(0x30a) -> #1 fixed counter > >>>> > >>>> NMI from where things goes wrong: > >>>> Both counter cause the NMI. Resetting works NOT correct, only for > >>>> the general counter! The general counter (caused the NMI) seems to > >>>> be stopped! > >>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] > >>>> rdmsrl(0xc3) -> #2 general counter > >>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] > >>>> rdmsrl(0x30a) -> #1 fixed counter > >>>> > >>>> Wrong NMI: > >>>> Only the fixed counter causes the NMI (which was not resetted > >>>> during NMI handling above!) Both counter seems to be stopped! > >>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] > >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > >>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] > >>>> rdmsrl(0xc3) -> #2 general counter > >>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] > >>>> rdmsrl(0x30a) -> #1 fixed counter > >>>> > >>>> And this state remains forever! > >>>> I hope my explanations are understandable ;-) > >>>> > >>>> Until now I can see this behavior only on a Nehalem processor. > >>>> > >>>> Thanks. > >>>> Dietmar > >>>> > >>>>> > >>>>> Best Regards > >>>>> Shan Haitao > >>>>> > >>>>> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: > >>>>>> On 30/10/2009 12:20, "Dietmar Hahn" > >>>>>> <dietmar.hahn@ts.fujitsu.com> wrote: > >>>>>> > >>>>>>> I searched the intel processor spec but couldn''t find any help. > >>>>>>> So my questions is, what is wrong here? > >>>>>>> Can anybody with more knowledge point me in the right direction, > >>>>>>> what can I still do to find the real cause of this? > >>>>>> > >>>>>> You should probably Cc one of the Intel guys who implemented this > >>>>>> stuff -- I''ve added Haitao Shan. > >>>>>> > >>>>>> Meanwhile I''d be interested to know whether things work okay for > >>>>>> you, minus performance counters and the hypervisor hang, if you > >>>>>> return immediately from vpmu_initialise(). Really at minimum we > >>>>>> need such a fix, perhaps with a boot paremeter to re-enable the > >>>>>> feature, for 3.4.2 release; allowing guests to hose the > >>>>>> hypervisor like this is of course not on. > >>>>>> > >>>>>> -- Keir > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >-- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shan, Haitao
2009-Nov-03 08:02 UTC
RE: [Xen-devel] Need help in debugging partially blocked hypervisor
I suspect the guest will reproduce this PMI loop if guest behaves as you said in this email. But as far as I know, VTune and oprofile do not behave like that. Of course, this approach is still like workaround (unless I get comfirm that HW requires to do so). This approach is preferrable because it does not change the contents of MSRs. Thus, we have no impact on guest software that does rely on reading the correct value from HW. Approach 1 existed just because we knew that in event-based sampling, counter value on receiving PMI was not used by OProfile/VTune at all and it was safe to set the counter to some non-zero value. Haitao Dietmar Hahn wrote:> Please see below. > >> See my comments embedded. :) >> >> Haitao >> >> >> Dietmar Hahn wrote: >>> The conclusion is, that this seems to be a workaround for the >>> endless NMI loop. PMI''s are a very rarely event and this should not >>> raise a performance problem. >> I totally agree that this is only a workaround for approach 1. >> >>> >>> I didn''t try your second approach >>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical >>>> PMI* when guest vcpu unmasks virtual PMI. but I have some question. >>> >>> - What if the ''physical PMI'' is not unmasked in vpmu_do_interrupt >>> and a watchdog NMI would occur before the domU unmasks it? >> I think the second NMI will be lost. >> >>> - Is it possible that after handling the NMI (and not unmasking) >>> another domU got running on this CPU and therefore PMI''s got lost? >> LVTPC entry in physical local APIC is save/restored by Xen on VCPU >> switches. So unmasking (or not) of PMI of one vcpu should have no >> impact on another vcpu. When developing vPMU, I treated as vPMU >> context both PMU MSRs and LVTPC entry in local APIC. vPMU context is >> save/restored on physical HW when vcpus is scheduled, either in an >> active save/restore manner or a lazy one (depending on the PMU usage >> at the time of switch). >> >>> >>> But the real cause of the problem is unknown. As said I saw this >>> only on Nehalem. Maybe there is a problem together with the >>> hardware? Perhaps your hardware colleagues know something more ;-) >> When I found this problem, I just thought it might be a corner case >> that only happens on my box (of course, I only see this in NHM, >> too). >> I will try to pin HW guy to see if any explanation, since it is >> proven to be a general problem on NHM. >> >> But before everything is clear, I think approach 2 is a better >> solution now. > > What would be the effect if the guest unmasks the PMI (which leads to > unmasking the ''physical PMI'') but doesn''t reset the counter to a > value != 0? Is the guest able to produce the nmi endless loop? > > Dietmar. > >> >>> >>> Thanks >>> Dietmar >>> >>>> >>>>> >>>>> When I met this problem, I remember that I tried two approaches: >>>>> 1> Setting the counter to non-zero before unmasking PMI in >>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from vpmu_do_interrupt >>>>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI. >>>>> I remember that approach 2 can fix this issue. But I do not >>>>> remember the result of approach 1, since I met this about one >>>>> year ago. It is my understanding that approach 2 is quite same as >>>>> approach 1, since normally guest will set the counter to some >>>>> negative value (for example, -100000) before unmasking virtual >>>>> PMI. >>>>> However, approach 2 looks cleaner and more reasonable. >>>>> >>>>> Can you have a try and let me know the result? If both can not >>>>> work, there might be some problems that I have not met before. >>>>> >>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. >>>>> So, there is no need for me to work on that now. :) >>>>> >>>>> Haitao >>>>> >>>>> >>>>> Dietmar Hahn wrote: >>>>>> Hi Haitao, >>>>>> >>>>>>> Can I know how you enabled vPMU on Nehalem? This is not >>>>>>> supported in current Xen. >>>>>> >>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html >>>>>> >>>>>>> >>>>>>> Concerning vpmu support, I totally agree that we can disable >>>>>>> this feature by default. If anyone really wants to use it, he >>>>>>> can use boot options to turn it on. >>>>>> >>>>>> Yes, that''s OK for me. >>>>>> >>>>>>> I am preparing a patch for that. And I will >>>>>>> send a patch to enable NHM vpmu together. >>>>>>> >>>>>>> For the problem that Dietmar met, I think I once met this >>>>>>> before. Can you add some code in vpmu_do_interrupt that sets >>>>>>> the counter you are using to a value other than zero? Please >>>>>>> let me know if that can help. >>>>>> >>>>>> I don''t set the counter to zero. I use 0-val to set the counter. >>>>>> Actually I testet on Nehalem with >>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and >>>>>> val=1100000 >>>>>> - Fixed counter #1 (0x30a) and val=1100000 >>>>>> The thing is that in normal case the overflows of both counters >>>>>> appear nearly at the same time. As described I added some extra >>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code >>>>>> looks like: >>>>>> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. >>>>>> Step { uint32_t HAHN_l, HAHN_h; >>>>>> HAHN_l = (uint32_t) msr_content; >>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>> HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step >>>>>> } if ( !msr_content ) return 0; >>>>>> core2_vpmu_cxt->global_ovf_status |= msr_content; >>>>>> msr_content = 0xC000000700000000 | ((1 << >>>>>> core2_get_pmc_count()) - 1); >>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. Step >>>>>> >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. >>>>>> Step { uint32_t HAHN_l, HAHN_h; >>>>>> HAHN_l = (uint32_t) msr_content; >>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. >>>>>> Step >>>>>> >>>>>> rdmsrl(0xc3, msr_content); -> 6. >>>>>> Step General counter #2 HAHN_l = (uint32_t) msr_content; >>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); >>>>>> rdmsrl(0x30a, msr_content); -> 7. >>>>>> Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; >>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>> HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } >>>>>> >>>>>> With these tracers I got the following output: >>>>>> >>>>>> Last good NMI: >>>>>> Both counter cause the NMI. Resetting works OK. >>>>>> The counter itself were running further. >>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>> 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] >>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] >>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>> >>>>>> NMI from where things goes wrong: >>>>>> Both counter cause the NMI. Resetting works NOT correct, only for >>>>>> the general counter! The general counter (caused the NMI) seems >>>>>> to be stopped! >>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>> >>>>>> Wrong NMI: >>>>>> Only the fixed counter causes the NMI (which was not resetted >>>>>> during NMI handling above!) Both counter seems to be stopped! >>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>> >>>>>> And this state remains forever! >>>>>> I hope my explanations are understandable ;-) >>>>>> >>>>>> Until now I can see this behavior only on a Nehalem processor. >>>>>> >>>>>> Thanks. >>>>>> Dietmar >>>>>> >>>>>>> >>>>>>> Best Regards >>>>>>> Shan Haitao >>>>>>> >>>>>>> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: >>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn" >>>>>>>> <dietmar.hahn@ts.fujitsu.com> wrote: >>>>>>>> >>>>>>>>> I searched the intel processor spec but couldn''t find any >>>>>>>>> help. So my questions is, what is wrong here? >>>>>>>>> Can anybody with more knowledge point me in the right >>>>>>>>> direction, what can I still do to find the real cause of this? >>>>>>>> >>>>>>>> You should probably Cc one of the Intel guys who implemented >>>>>>>> this stuff -- I''ve added Haitao Shan. >>>>>>>> >>>>>>>> Meanwhile I''d be interested to know whether things work okay >>>>>>>> for you, minus performance counters and the hypervisor hang, >>>>>>>> if you return immediately from vpmu_initialise(). Really at >>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to >>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to >>>>>>>> hose the hypervisor like this is of course not on. >>>>>>>> >>>>>>>> -- Keir >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Nov-03 08:24 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
> I suspect the guest will reproduce this PMI loop if guest behaves as you said in this email. But as far as I know, VTune and oprofile do not behave like that. > Of course, this approach is still like workaround (unless I get comfirm that HW requires to do so). This approach is preferrable because it does not change the contents of MSRs. Thus, we have no impact on guest software that does rely on reading the correct value from HW. Approach 1 existed just because we knew that in event-based sampling, counter value on receiving PMI was not used by OProfile/VTune at all and it was safe to set the counter to some non-zero value. > > Haitao >OK, then will you send a patch? Dietmar.> > Dietmar Hahn wrote: > > Please see below. > > > >> See my comments embedded. :) > >> > >> Haitao > >> > >> > >> Dietmar Hahn wrote: > >>> The conclusion is, that this seems to be a workaround for the > >>> endless NMI loop. PMI''s are a very rarely event and this should not > >>> raise a performance problem. > >> I totally agree that this is only a workaround for approach 1. > >> > >>> > >>> I didn''t try your second approach > >>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask *physical > >>>> PMI* when guest vcpu unmasks virtual PMI. but I have some question. > >>> > >>> - What if the ''physical PMI'' is not unmasked in vpmu_do_interrupt > >>> and a watchdog NMI would occur before the domU unmasks it? > >> I think the second NMI will be lost. > >> > >>> - Is it possible that after handling the NMI (and not unmasking) > >>> another domU got running on this CPU and therefore PMI''s got lost? > >> LVTPC entry in physical local APIC is save/restored by Xen on VCPU > >> switches. So unmasking (or not) of PMI of one vcpu should have no > >> impact on another vcpu. When developing vPMU, I treated as vPMU > >> context both PMU MSRs and LVTPC entry in local APIC. vPMU context is > >> save/restored on physical HW when vcpus is scheduled, either in an > >> active save/restore manner or a lazy one (depending on the PMU usage > >> at the time of switch). > >> > >>> > >>> But the real cause of the problem is unknown. As said I saw this > >>> only on Nehalem. Maybe there is a problem together with the > >>> hardware? Perhaps your hardware colleagues know something more ;-) > >> When I found this problem, I just thought it might be a corner case > >> that only happens on my box (of course, I only see this in NHM, > >> too). > >> I will try to pin HW guy to see if any explanation, since it is > >> proven to be a general problem on NHM. > >> > >> But before everything is clear, I think approach 2 is a better > >> solution now. > > > > What would be the effect if the guest unmasks the PMI (which leads to > > unmasking the ''physical PMI'') but doesn''t reset the counter to a > > value != 0? Is the guest able to produce the nmi endless loop? > > > > Dietmar. > > > >> > >>> > >>> Thanks > >>> Dietmar > >>> > >>>> > >>>>> > >>>>> When I met this problem, I remember that I tried two approaches: > >>>>> 1> Setting the counter to non-zero before unmasking PMI in > >>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from vpmu_do_interrupt > >>>>> and unmask *physical PMI* when guest vcpu unmasks virtual PMI. > >>>>> I remember that approach 2 can fix this issue. But I do not > >>>>> remember the result of approach 1, since I met this about one > >>>>> year ago. It is my understanding that approach 2 is quite same as > >>>>> approach 1, since normally guest will set the counter to some > >>>>> negative value (for example, -100000) before unmasking virtual > >>>>> PMI. > >>>>> However, approach 2 looks cleaner and more reasonable. > >>>>> > >>>>> Can you have a try and let me know the result? If both can not > >>>>> work, there might be some problems that I have not met before. > >>>>> > >>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. > >>>>> So, there is no need for me to work on that now. :) > >>>>> > >>>>> Haitao > >>>>> > >>>>> > >>>>> Dietmar Hahn wrote: > >>>>>> Hi Haitao, > >>>>>> > >>>>>>> Can I know how you enabled vPMU on Nehalem? This is not > >>>>>>> supported in current Xen. > >>>>>> > >>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html > >>>>>> > >>>>>>> > >>>>>>> Concerning vpmu support, I totally agree that we can disable > >>>>>>> this feature by default. If anyone really wants to use it, he > >>>>>>> can use boot options to turn it on. > >>>>>> > >>>>>> Yes, that''s OK for me. > >>>>>> > >>>>>>> I am preparing a patch for that. And I will > >>>>>>> send a patch to enable NHM vpmu together. > >>>>>>> > >>>>>>> For the problem that Dietmar met, I think I once met this > >>>>>>> before. Can you add some code in vpmu_do_interrupt that sets > >>>>>>> the counter you are using to a value other than zero? Please > >>>>>>> let me know if that can help. > >>>>>> > >>>>>> I don''t set the counter to zero. I use 0-val to set the counter. > >>>>>> Actually I testet on Nehalem with > >>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and > >>>>>> val=1100000 > >>>>>> - Fixed counter #1 (0x30a) and val=1100000 > >>>>>> The thing is that in normal case the overflows of both counters > >>>>>> appear nearly at the same time. As described I added some extra > >>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code > >>>>>> looks like: > >>>>>> > >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. > >>>>>> Step { uint32_t HAHN_l, HAHN_h; > >>>>>> HAHN_l = (uint32_t) msr_content; > >>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>> HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step > >>>>>> } if ( !msr_content ) return 0; > >>>>>> core2_vpmu_cxt->global_ovf_status |= msr_content; > >>>>>> msr_content = 0xC000000700000000 | ((1 << > >>>>>> core2_get_pmc_count()) - 1); > >>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. Step > >>>>>> > >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. > >>>>>> Step { uint32_t HAHN_l, HAHN_h; > >>>>>> HAHN_l = (uint32_t) msr_content; > >>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> 5. > >>>>>> Step > >>>>>> > >>>>>> rdmsrl(0xc3, msr_content); -> 6. > >>>>>> Step General counter #2 HAHN_l = (uint32_t) msr_content; > >>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); > >>>>>> rdmsrl(0x30a, msr_content); -> 7. > >>>>>> Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; > >>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>> HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } > >>>>>> > >>>>>> With these tracers I got the following output: > >>>>>> > >>>>>> Last good NMI: > >>>>>> Both counter cause the NMI. Resetting works OK. > >>>>>> The counter itself were running further. > >>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>> 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] > >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] > >>>>>> rdmsrl(0xc3) -> #2 general counter > >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] > >>>>>> rdmsrl(0x30a) -> #1 fixed counter > >>>>>> > >>>>>> NMI from where things goes wrong: > >>>>>> Both counter cause the NMI. Resetting works NOT correct, only for > >>>>>> the general counter! The general counter (caused the NMI) seems > >>>>>> to be stopped! > >>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] > >>>>>> rdmsrl(0xc3) -> #2 general counter > >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] > >>>>>> rdmsrl(0x30a) -> #1 fixed counter > >>>>>> > >>>>>> Wrong NMI: > >>>>>> Only the fixed counter causes the NMI (which was not resetted > >>>>>> during NMI handling above!) Both counter seems to be stopped! > >>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] > >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > >>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] > >>>>>> rdmsrl(0xc3) -> #2 general counter > >>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] > >>>>>> rdmsrl(0x30a) -> #1 fixed counter > >>>>>> > >>>>>> And this state remains forever! > >>>>>> I hope my explanations are understandable ;-) > >>>>>> > >>>>>> Until now I can see this behavior only on a Nehalem processor. > >>>>>> > >>>>>> Thanks. > >>>>>> Dietmar > >>>>>> > >>>>>>> > >>>>>>> Best Regards > >>>>>>> Shan Haitao > >>>>>>> > >>>>>>> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: > >>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn" > >>>>>>>> <dietmar.hahn@ts.fujitsu.com> wrote: > >>>>>>>> > >>>>>>>>> I searched the intel processor spec but couldn''t find any > >>>>>>>>> help. So my questions is, what is wrong here? > >>>>>>>>> Can anybody with more knowledge point me in the right > >>>>>>>>> direction, what can I still do to find the real cause of this? > >>>>>>>> > >>>>>>>> You should probably Cc one of the Intel guys who implemented > >>>>>>>> this stuff -- I''ve added Haitao Shan. > >>>>>>>> > >>>>>>>> Meanwhile I''d be interested to know whether things work okay > >>>>>>>> for you, minus performance counters and the hypervisor hang, > >>>>>>>> if you return immediately from vpmu_initialise(). Really at > >>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to > >>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to > >>>>>>>> hose the hypervisor like this is of course not on. > >>>>>>>> > >>>>>>>> -- Keir > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >-- Dietmar Hahn TSP ES&S SWE OS Telephone: +49 (0) 89 636 40274 Fujitsu Technology Solutions Email: dietmar.hahn@ts.fujitsu.com Otto-Hahn-Ring 6 Internet: http://ts.fujitsu.com D-81739 München Company details:ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shan, Haitao
2009-Nov-03 08:43 UTC
RE: [Xen-devel] Need help in debugging partially blocked hypervisor
No problem. Can you help to test? I have no test box at hand now, which might cause delay. Haitao Dietmar Hahn wrote:>> I suspect the guest will reproduce this PMI loop if guest behaves as >> you said in this email. But as far as I know, VTune and oprofile do >> not behave like that. >> Of course, this approach is still like workaround (unless I get >> comfirm that HW requires to do so). This approach is preferrable >> because it does not change the contents of MSRs. Thus, we have no >> impact on guest software that does rely on reading the correct value >> from HW. Approach 1 existed just because we knew that in event-based >> sampling, counter value on receiving PMI was not used by >> OProfile/VTune at all and it was safe to set the counter to some >> non-zero value. >> >> Haitao >> > > OK, then will you send a patch? > Dietmar. > >> >> Dietmar Hahn wrote: >>> Please see below. >>> >>>> See my comments embedded. :) >>>> >>>> Haitao >>>> >>>> >>>> Dietmar Hahn wrote: >>>>> The conclusion is, that this seems to be a workaround for the >>>>> endless NMI loop. PMI''s are a very rarely event and this should >>>>> not raise a performance problem. >>>> I totally agree that this is only a workaround for approach 1. >>>> >>>>> >>>>> I didn''t try your second approach >>>>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask >>>>>> *physical PMI* when guest vcpu unmasks virtual PMI. but I have >>>>>> some question. >>>>> >>>>> - What if the ''physical PMI'' is not unmasked in vpmu_do_interrupt >>>>> and a watchdog NMI would occur before the domU unmasks it? >>>> I think the second NMI will be lost. >>>> >>>>> - Is it possible that after handling the NMI (and not unmasking) >>>>> another domU got running on this CPU and therefore PMI''s got >>>>> lost? >>>> LVTPC entry in physical local APIC is save/restored by Xen on VCPU >>>> switches. So unmasking (or not) of PMI of one vcpu should have no >>>> impact on another vcpu. When developing vPMU, I treated as vPMU >>>> context both PMU MSRs and LVTPC entry in local APIC. vPMU context >>>> is save/restored on physical HW when vcpus is scheduled, either in >>>> an active save/restore manner or a lazy one (depending on the PMU >>>> usage at the time of switch). >>>> >>>>> >>>>> But the real cause of the problem is unknown. As said I saw this >>>>> only on Nehalem. Maybe there is a problem together with the >>>>> hardware? Perhaps your hardware colleagues know something more ;-) >>>> When I found this problem, I just thought it might be a corner case >>>> that only happens on my box (of course, I only see this in NHM, >>>> too). I will try to pin HW guy to see if any explanation, since it >>>> is proven to be a general problem on NHM. >>>> >>>> But before everything is clear, I think approach 2 is a better >>>> solution now. >>> >>> What would be the effect if the guest unmasks the PMI (which leads >>> to unmasking the ''physical PMI'') but doesn''t reset the counter to a >>> value != 0? Is the guest able to produce the nmi endless loop? >>> >>> Dietmar. >>> >>>> >>>>> >>>>> Thanks >>>>> Dietmar >>>>> >>>>>> >>>>>>> >>>>>>> When I met this problem, I remember that I tried two approaches: >>>>>>> 1> Setting the counter to non-zero before unmasking PMI in >>>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from >>>>>>> vpmu_do_interrupt and unmask *physical PMI* when guest vcpu >>>>>>> unmasks virtual PMI. >>>>>>> I remember that approach 2 can fix this issue. But I do not >>>>>>> remember the result of approach 1, since I met this about one >>>>>>> year ago. It is my understanding that approach 2 is quite same >>>>>>> as approach 1, since normally guest will set the counter to some >>>>>>> negative value (for example, -100000) before unmasking virtual >>>>>>> PMI. However, approach 2 looks cleaner and more reasonable. >>>>>>> >>>>>>> Can you have a try and let me know the result? If both can not >>>>>>> work, there might be some problems that I have not met before. >>>>>>> >>>>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. >>>>>>> So, there is no need for me to work on that now. :) >>>>>>> >>>>>>> Haitao >>>>>>> >>>>>>> >>>>>>> Dietmar Hahn wrote: >>>>>>>> Hi Haitao, >>>>>>>> >>>>>>>>> Can I know how you enabled vPMU on Nehalem? This is not >>>>>>>>> supported in current Xen. >>>>>>>> >>>>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html >>>>>>>> >>>>>>>>> >>>>>>>>> Concerning vpmu support, I totally agree that we can disable >>>>>>>>> this feature by default. If anyone really wants to use it, he >>>>>>>>> can use boot options to turn it on. >>>>>>>> >>>>>>>> Yes, that''s OK for me. >>>>>>>> >>>>>>>>> I am preparing a patch for that. And I will >>>>>>>>> send a patch to enable NHM vpmu together. >>>>>>>>> >>>>>>>>> For the problem that Dietmar met, I think I once met this >>>>>>>>> before. Can you add some code in vpmu_do_interrupt that sets >>>>>>>>> the counter you are using to a value other than zero? Please >>>>>>>>> let me know if that can help. >>>>>>>> >>>>>>>> I don''t set the counter to zero. I use 0-val to set the >>>>>>>> counter. Actually I testet on Nehalem with >>>>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and >>>>>>>> val=1100000 >>>>>>>> - Fixed counter #1 (0x30a) and val=1100000 >>>>>>>> The thing is that in normal case the overflows of both counters >>>>>>>> appear nearly at the same time. As described I added some extra >>>>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code >>>>>>>> looks like: >>>>>>>> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; >>>>>>>> HAHN_l = (uint32_t) msr_content; >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step >>>>>>>> } if ( !msr_content ) return 0; >>>>>>>> core2_vpmu_cxt->global_ovf_status |= msr_content; >>>>>>>> msr_content = 0xC000000700000000 | ((1 << >>>>>>>> core2_get_pmc_count()) - 1); >>>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. >>>>>>>> Step >>>>>>>> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; >>>>>>>> HAHN_l = (uint32_t) msr_content; >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> >>>>>>>> 5. Step >>>>>>>> >>>>>>>> rdmsrl(0xc3, msr_content); -> 6. >>>>>>>> Step General counter #2 HAHN_l = (uint32_t) >>>>>>>> msr_content; HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); >>>>>>>> rdmsrl(0x30a, msr_content); -> 7. >>>>>>>> Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } >>>>>>>> >>>>>>>> With these tracers I got the following output: >>>>>>>> >>>>>>>> Last good NMI: >>>>>>>> Both counter cause the NMI. Resetting works OK. >>>>>>>> The counter itself were running further. >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] >>>>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>>>> >>>>>>>> NMI from where things goes wrong: >>>>>>>> Both counter cause the NMI. Resetting works NOT correct, only >>>>>>>> for the general counter! The general counter (caused the NMI) >>>>>>>> seems to be stopped! >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>>>> >>>>>>>> Wrong NMI: >>>>>>>> Only the fixed counter causes the NMI (which was not resetted >>>>>>>> during NMI handling above!) Both counter seems to be stopped! >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>>>> >>>>>>>> And this state remains forever! >>>>>>>> I hope my explanations are understandable ;-) >>>>>>>> >>>>>>>> Until now I can see this behavior only on a Nehalem processor. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> Dietmar >>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards >>>>>>>>> Shan Haitao >>>>>>>>> >>>>>>>>> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: >>>>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn" >>>>>>>>>> <dietmar.hahn@ts.fujitsu.com> wrote: >>>>>>>>>> >>>>>>>>>>> I searched the intel processor spec but couldn''t find any >>>>>>>>>>> help. So my questions is, what is wrong here? >>>>>>>>>>> Can anybody with more knowledge point me in the right >>>>>>>>>>> direction, what can I still do to find the real cause of >>>>>>>>>>> this? >>>>>>>>>> >>>>>>>>>> You should probably Cc one of the Intel guys who implemented >>>>>>>>>> this stuff -- I''ve added Haitao Shan. >>>>>>>>>> >>>>>>>>>> Meanwhile I''d be interested to know whether things work okay >>>>>>>>>> for you, minus performance counters and the hypervisor hang, >>>>>>>>>> if you return immediately from vpmu_initialise(). Really at >>>>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to >>>>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to >>>>>>>>>> hose the hypervisor like this is of course not on. >>>>>>>>>> >>>>>>>>>> -- Keir >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shan, Haitao
2009-Nov-03 09:00 UTC
RE: [Xen-devel] Need help in debugging partially blocked hypervisor
Hi, Dietmar, Please review the attached patch. Any comments? Haitao Dietmar Hahn wrote:>> I suspect the guest will reproduce this PMI loop if guest behaves as >> you said in this email. But as far as I know, VTune and oprofile do >> not behave like that. >> Of course, this approach is still like workaround (unless I get >> comfirm that HW requires to do so). This approach is preferrable >> because it does not change the contents of MSRs. Thus, we have no >> impact on guest software that does rely on reading the correct value >> from HW. Approach 1 existed just because we knew that in event-based >> sampling, counter value on receiving PMI was not used by >> OProfile/VTune at all and it was safe to set the counter to some >> non-zero value. >> >> Haitao >> > > OK, then will you send a patch? > Dietmar. > >> >> Dietmar Hahn wrote: >>> Please see below. >>> >>>> See my comments embedded. :) >>>> >>>> Haitao >>>> >>>> >>>> Dietmar Hahn wrote: >>>>> The conclusion is, that this seems to be a workaround for the >>>>> endless NMI loop. PMI''s are a very rarely event and this should >>>>> not raise a performance problem. >>>> I totally agree that this is only a workaround for approach 1. >>>> >>>>> >>>>> I didn''t try your second approach >>>>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask >>>>>> *physical PMI* when guest vcpu unmasks virtual PMI. but I have >>>>>> some question. >>>>> >>>>> - What if the ''physical PMI'' is not unmasked in vpmu_do_interrupt >>>>> and a watchdog NMI would occur before the domU unmasks it? >>>> I think the second NMI will be lost. >>>> >>>>> - Is it possible that after handling the NMI (and not unmasking) >>>>> another domU got running on this CPU and therefore PMI''s got >>>>> lost? >>>> LVTPC entry in physical local APIC is save/restored by Xen on VCPU >>>> switches. So unmasking (or not) of PMI of one vcpu should have no >>>> impact on another vcpu. When developing vPMU, I treated as vPMU >>>> context both PMU MSRs and LVTPC entry in local APIC. vPMU context >>>> is save/restored on physical HW when vcpus is scheduled, either in >>>> an active save/restore manner or a lazy one (depending on the PMU >>>> usage at the time of switch). >>>> >>>>> >>>>> But the real cause of the problem is unknown. As said I saw this >>>>> only on Nehalem. Maybe there is a problem together with the >>>>> hardware? Perhaps your hardware colleagues know something more ;-) >>>> When I found this problem, I just thought it might be a corner case >>>> that only happens on my box (of course, I only see this in NHM, >>>> too). I will try to pin HW guy to see if any explanation, since it >>>> is proven to be a general problem on NHM. >>>> >>>> But before everything is clear, I think approach 2 is a better >>>> solution now. >>> >>> What would be the effect if the guest unmasks the PMI (which leads >>> to unmasking the ''physical PMI'') but doesn''t reset the counter to a >>> value != 0? Is the guest able to produce the nmi endless loop? >>> >>> Dietmar. >>> >>>> >>>>> >>>>> Thanks >>>>> Dietmar >>>>> >>>>>> >>>>>>> >>>>>>> When I met this problem, I remember that I tried two approaches: >>>>>>> 1> Setting the counter to non-zero before unmasking PMI in >>>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from >>>>>>> vpmu_do_interrupt and unmask *physical PMI* when guest vcpu >>>>>>> unmasks virtual PMI. >>>>>>> I remember that approach 2 can fix this issue. But I do not >>>>>>> remember the result of approach 1, since I met this about one >>>>>>> year ago. It is my understanding that approach 2 is quite same >>>>>>> as approach 1, since normally guest will set the counter to some >>>>>>> negative value (for example, -100000) before unmasking virtual >>>>>>> PMI. However, approach 2 looks cleaner and more reasonable. >>>>>>> >>>>>>> Can you have a try and let me know the result? If both can not >>>>>>> work, there might be some problems that I have not met before. >>>>>>> >>>>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. >>>>>>> So, there is no need for me to work on that now. :) >>>>>>> >>>>>>> Haitao >>>>>>> >>>>>>> >>>>>>> Dietmar Hahn wrote: >>>>>>>> Hi Haitao, >>>>>>>> >>>>>>>>> Can I know how you enabled vPMU on Nehalem? This is not >>>>>>>>> supported in current Xen. >>>>>>>> >>>>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html >>>>>>>> >>>>>>>>> >>>>>>>>> Concerning vpmu support, I totally agree that we can disable >>>>>>>>> this feature by default. If anyone really wants to use it, he >>>>>>>>> can use boot options to turn it on. >>>>>>>> >>>>>>>> Yes, that''s OK for me. >>>>>>>> >>>>>>>>> I am preparing a patch for that. And I will >>>>>>>>> send a patch to enable NHM vpmu together. >>>>>>>>> >>>>>>>>> For the problem that Dietmar met, I think I once met this >>>>>>>>> before. Can you add some code in vpmu_do_interrupt that sets >>>>>>>>> the counter you are using to a value other than zero? Please >>>>>>>>> let me know if that can help. >>>>>>>> >>>>>>>> I don''t set the counter to zero. I use 0-val to set the >>>>>>>> counter. Actually I testet on Nehalem with >>>>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and >>>>>>>> val=1100000 >>>>>>>> - Fixed counter #1 (0x30a) and val=1100000 >>>>>>>> The thing is that in normal case the overflows of both counters >>>>>>>> appear nearly at the same time. As described I added some extra >>>>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code >>>>>>>> looks like: >>>>>>>> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; >>>>>>>> HAHN_l = (uint32_t) msr_content; >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step >>>>>>>> } if ( !msr_content ) return 0; >>>>>>>> core2_vpmu_cxt->global_ovf_status |= msr_content; >>>>>>>> msr_content = 0xC000000700000000 | ((1 << >>>>>>>> core2_get_pmc_count()) - 1); >>>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. >>>>>>>> Step >>>>>>>> >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; >>>>>>>> HAHN_l = (uint32_t) msr_content; >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> >>>>>>>> 5. Step >>>>>>>> >>>>>>>> rdmsrl(0xc3, msr_content); -> 6. >>>>>>>> Step General counter #2 HAHN_l = (uint32_t) >>>>>>>> msr_content; HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); >>>>>>>> rdmsrl(0x30a, msr_content); -> 7. >>>>>>>> Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } >>>>>>>> >>>>>>>> With these tracers I got the following output: >>>>>>>> >>>>>>>> Last good NMI: >>>>>>>> Both counter cause the NMI. Resetting works OK. >>>>>>>> The counter itself were running further. >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] >>>>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>>>> >>>>>>>> NMI from where things goes wrong: >>>>>>>> Both counter cause the NMI. Resetting works NOT correct, only >>>>>>>> for the general counter! The general counter (caused the NMI) >>>>>>>> seems to be stopped! >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>>>> >>>>>>>> Wrong NMI: >>>>>>>> Only the fixed counter causes the NMI (which was not resetted >>>>>>>> during NMI handling above!) Both counter seems to be stopped! >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] >>>>>>>> rdmsrl(0xc3) -> #2 general counter >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter >>>>>>>> >>>>>>>> And this state remains forever! >>>>>>>> I hope my explanations are understandable ;-) >>>>>>>> >>>>>>>> Until now I can see this behavior only on a Nehalem processor. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> Dietmar >>>>>>>> >>>>>>>>> >>>>>>>>> Best Regards >>>>>>>>> Shan Haitao >>>>>>>>> >>>>>>>>> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: >>>>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn" >>>>>>>>>> <dietmar.hahn@ts.fujitsu.com> wrote: >>>>>>>>>> >>>>>>>>>>> I searched the intel processor spec but couldn''t find any >>>>>>>>>>> help. So my questions is, what is wrong here? >>>>>>>>>>> Can anybody with more knowledge point me in the right >>>>>>>>>>> direction, what can I still do to find the real cause of >>>>>>>>>>> this? >>>>>>>>>> >>>>>>>>>> You should probably Cc one of the Intel guys who implemented >>>>>>>>>> this stuff -- I''ve added Haitao Shan. >>>>>>>>>> >>>>>>>>>> Meanwhile I''d be interested to know whether things work okay >>>>>>>>>> for you, minus performance counters and the hypervisor hang, >>>>>>>>>> if you return immediately from vpmu_initialise(). Really at >>>>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to >>>>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to >>>>>>>>>> hose the hypervisor like this is of course not on. >>>>>>>>>> >>>>>>>>>> -- Keir >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dietmar Hahn
2009-Nov-03 09:03 UTC
Re: [Xen-devel] Need help in debugging partially blocked hypervisor
> No problem. > Can you help to test? I have no test box at hand now, which might cause delay. >Sure :-) Dietmar.> Haitao > > > Dietmar Hahn wrote: > >> I suspect the guest will reproduce this PMI loop if guest behaves as > >> you said in this email. But as far as I know, VTune and oprofile do > >> not behave like that. > >> Of course, this approach is still like workaround (unless I get > >> comfirm that HW requires to do so). This approach is preferrable > >> because it does not change the contents of MSRs. Thus, we have no > >> impact on guest software that does rely on reading the correct value > >> from HW. Approach 1 existed just because we knew that in event-based > >> sampling, counter value on receiving PMI was not used by > >> OProfile/VTune at all and it was safe to set the counter to some > >> non-zero value. > >> > >> Haitao > >> > > > > OK, then will you send a patch? > > Dietmar. > > > >> > >> Dietmar Hahn wrote: > >>> Please see below. > >>> > >>>> See my comments embedded. :) > >>>> > >>>> Haitao > >>>> > >>>> > >>>> Dietmar Hahn wrote: > >>>>> The conclusion is, that this seems to be a workaround for the > >>>>> endless NMI loop. PMI''s are a very rarely event and this should > >>>>> not raise a performance problem. > >>>> I totally agree that this is only a workaround for approach 1. > >>>> > >>>>> > >>>>> I didn''t try your second approach > >>>>>> 2> Remove unmasking PMI from vpmu_do_interrupt and unmask > >>>>>> *physical PMI* when guest vcpu unmasks virtual PMI. but I have > >>>>>> some question. > >>>>> > >>>>> - What if the ''physical PMI'' is not unmasked in vpmu_do_interrupt > >>>>> and a watchdog NMI would occur before the domU unmasks it? > >>>> I think the second NMI will be lost. > >>>> > >>>>> - Is it possible that after handling the NMI (and not unmasking) > >>>>> another domU got running on this CPU and therefore PMI''s got > >>>>> lost? > >>>> LVTPC entry in physical local APIC is save/restored by Xen on VCPU > >>>> switches. So unmasking (or not) of PMI of one vcpu should have no > >>>> impact on another vcpu. When developing vPMU, I treated as vPMU > >>>> context both PMU MSRs and LVTPC entry in local APIC. vPMU context > >>>> is save/restored on physical HW when vcpus is scheduled, either in > >>>> an active save/restore manner or a lazy one (depending on the PMU > >>>> usage at the time of switch). > >>>> > >>>>> > >>>>> But the real cause of the problem is unknown. As said I saw this > >>>>> only on Nehalem. Maybe there is a problem together with the > >>>>> hardware? Perhaps your hardware colleagues know something more ;-) > >>>> When I found this problem, I just thought it might be a corner case > >>>> that only happens on my box (of course, I only see this in NHM, > >>>> too). I will try to pin HW guy to see if any explanation, since it > >>>> is proven to be a general problem on NHM. > >>>> > >>>> But before everything is clear, I think approach 2 is a better > >>>> solution now. > >>> > >>> What would be the effect if the guest unmasks the PMI (which leads > >>> to unmasking the ''physical PMI'') but doesn''t reset the counter to a > >>> value != 0? Is the guest able to produce the nmi endless loop? > >>> > >>> Dietmar. > >>> > >>>> > >>>>> > >>>>> Thanks > >>>>> Dietmar > >>>>> > >>>>>> > >>>>>>> > >>>>>>> When I met this problem, I remember that I tried two approaches: > >>>>>>> 1> Setting the counter to non-zero before unmasking PMI in > >>>>>>> vpmu_do_interrupt; 2> Remove unmasking PMI from > >>>>>>> vpmu_do_interrupt and unmask *physical PMI* when guest vcpu > >>>>>>> unmasks virtual PMI. > >>>>>>> I remember that approach 2 can fix this issue. But I do not > >>>>>>> remember the result of approach 1, since I met this about one > >>>>>>> year ago. It is my understanding that approach 2 is quite same > >>>>>>> as approach 1, since normally guest will set the counter to some > >>>>>>> negative value (for example, -100000) before unmasking virtual > >>>>>>> PMI. However, approach 2 looks cleaner and more reasonable. > >>>>>>> > >>>>>>> Can you have a try and let me know the result? If both can not > >>>>>>> work, there might be some problems that I have not met before. > >>>>>>> > >>>>>>> BTW: Sorry, I did not see your patch to enable NHM vpmu before. > >>>>>>> So, there is no need for me to work on that now. :) > >>>>>>> > >>>>>>> Haitao > >>>>>>> > >>>>>>> > >>>>>>> Dietmar Hahn wrote: > >>>>>>>> Hi Haitao, > >>>>>>>> > >>>>>>>>> Can I know how you enabled vPMU on Nehalem? This is not > >>>>>>>>> supported in current Xen. > >>>>>>>> > >>>>>>>> http://lists.xensource.com/archives/html/xen-devel/2009-09/msg00829.html > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Concerning vpmu support, I totally agree that we can disable > >>>>>>>>> this feature by default. If anyone really wants to use it, he > >>>>>>>>> can use boot options to turn it on. > >>>>>>>> > >>>>>>>> Yes, that''s OK for me. > >>>>>>>> > >>>>>>>>> I am preparing a patch for that. And I will > >>>>>>>>> send a patch to enable NHM vpmu together. > >>>>>>>>> > >>>>>>>>> For the problem that Dietmar met, I think I once met this > >>>>>>>>> before. Can you add some code in vpmu_do_interrupt that sets > >>>>>>>>> the counter you are using to a value other than zero? Please > >>>>>>>>> let me know if that can help. > >>>>>>>> > >>>>>>>> I don''t set the counter to zero. I use 0-val to set the > >>>>>>>> counter. Actually I testet on Nehalem with > >>>>>>>> - General Perf-counter #2 (0xc3) with CPU_CLK_UNHALTED and > >>>>>>>> val=1100000 > >>>>>>>> - Fixed counter #1 (0x30a) and val=1100000 > >>>>>>>> The thing is that in normal case the overflows of both counters > >>>>>>>> appear nearly at the same time. As described I added some extra > >>>>>>>> tracer for xentrace in core2_vpmu_do_interrupt() so the code > >>>>>>>> looks like: > >>>>>>>> > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 1. > >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; > >>>>>>>> HAHN_l = (uint32_t) msr_content; > >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 1, HAHN_h, HAHN_l); -> 2. Step > >>>>>>>> } if ( !msr_content ) return 0; > >>>>>>>> core2_vpmu_cxt->global_ovf_status |= msr_content; > >>>>>>>> msr_content = 0xC000000700000000 | ((1 << > >>>>>>>> core2_get_pmc_count()) - 1); > >>>>>>>> wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); -> 3. > >>>>>>>> Step > >>>>>>>> > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, msr_content); -> 4. > >>>>>>>> Step { uint32_t HAHN_l, HAHN_h; > >>>>>>>> HAHN_l = (uint32_t) msr_content; > >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xa, HAHN_h, HAHN_l); -> > >>>>>>>> 5. Step > >>>>>>>> > >>>>>>>> rdmsrl(0xc3, msr_content); -> 6. > >>>>>>>> Step General counter #2 HAHN_l = (uint32_t) > >>>>>>>> msr_content; HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0xc3, HAHN_h, HAHN_l); > >>>>>>>> rdmsrl(0x30a, msr_content); -> 7. > >>>>>>>> Step Fixed counter #1 HAHN_l = (uint32_t) msr_content; > >>>>>>>> HAHN_h = (uint32_t) (msr_content >> 32); > >>>>>>>> HVMTRACE_3D(HAHN_TR2, v, 0x30a, HAHN_h, HAHN_l); } > >>>>>>>> > >>>>>>>> With these tracers I got the following output: > >>>>>>>> > >>>>>>>> Last good NMI: > >>>>>>>> Both counter cause the NMI. Resetting works OK. > >>>>>>>> The counter itself were running further. > >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0000, low = 0x0000 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x03c4 ] > >>>>>>>> rdmsrl(0xc3) -> #2 general counter > >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x02da ] > >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter > >>>>>>>> > >>>>>>>> NMI from where things goes wrong: > >>>>>>>> Both counter cause the NMI. Resetting works NOT correct, only > >>>>>>>> for the general counter! The general counter (caused the NMI) > >>>>>>>> seems to be stopped! > >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0004 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] > >>>>>>>> rdmsrl(0xc3) -> #2 general counter > >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] > >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter > >>>>>>>> > >>>>>>>> Wrong NMI: > >>>>>>>> Only the fixed counter causes the NMI (which was not resetted > >>>>>>>> during NMI handling above!) Both counter seems to be stopped! > >>>>>>>> 2. Step: par1 = 0x01, high = 0x0002, low = 0x0000 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 5. Step: par1 = 0x0a, high = 0x0002, low = 0x0000 ] > >>>>>>>> rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS) > >>>>>>>> 6. Step: par1 = 0xc3, high = 0x0000, low = 0x00ec ] > >>>>>>>> rdmsrl(0xc3) -> #2 general counter > >>>>>>>> 7. Step: par1 = 0x30a, high = 0x0000, low = 0x0000 ] > >>>>>>>> rdmsrl(0x30a) -> #1 fixed counter > >>>>>>>> > >>>>>>>> And this state remains forever! > >>>>>>>> I hope my explanations are understandable ;-) > >>>>>>>> > >>>>>>>> Until now I can see this behavior only on a Nehalem processor. > >>>>>>>> > >>>>>>>> Thanks. > >>>>>>>> Dietmar > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Best Regards > >>>>>>>>> Shan Haitao > >>>>>>>>> > >>>>>>>>> 2009/10/30 Keir Fraser <keir.fraser@eu.citrix.com>: > >>>>>>>>>> On 30/10/2009 12:20, "Dietmar Hahn" > >>>>>>>>>> <dietmar.hahn@ts.fujitsu.com> wrote: > >>>>>>>>>> > >>>>>>>>>>> I searched the intel processor spec but couldn''t find any > >>>>>>>>>>> help. So my questions is, what is wrong here? > >>>>>>>>>>> Can anybody with more knowledge point me in the right > >>>>>>>>>>> direction, what can I still do to find the real cause of > >>>>>>>>>>> this? > >>>>>>>>>> > >>>>>>>>>> You should probably Cc one of the Intel guys who implemented > >>>>>>>>>> this stuff -- I''ve added Haitao Shan. > >>>>>>>>>> > >>>>>>>>>> Meanwhile I''d be interested to know whether things work okay > >>>>>>>>>> for you, minus performance counters and the hypervisor hang, > >>>>>>>>>> if you return immediately from vpmu_initialise(). Really at > >>>>>>>>>> minimum we need such a fix, perhaps with a boot paremeter to > >>>>>>>>>> re-enable the feature, for 3.4.2 release; allowing guests to > >>>>>>>>>> hose the hypervisor like this is of course not on. > >>>>>>>>>> > >>>>>>>>>> -- Keir > >>>> _______________________________________________ > >>>> Xen-devel mailing list > >>>> Xen-devel@lists.xensource.com > >>>> http://lists.xensource.com/xen-devel > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > >-- Company details: http://ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel