George Dunlap
2012-Aug-17 11:17 UTC
Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2
I just tried to install Xen-4.2.0-rc2 on a Debian wheezy system, but couldn''t boot under Xen 4.2. The box is an 8-core AMD, I think Barcelona. The wheezy kernel is 3.2.21-3, 32-bit version. The problems seem to have started here: -- snip -- [ 0.060280] ACPI: Core revision 20110623^M^M [ 0.072384] Performance Events: Broken BIOS detected, complain to your hardware vendor.^M^M [ 0.076014] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010000 is 530076)^M^M [ 0.080007] AMD PMU driver.^M^M [ 0.082864] ------------[ cut here ]------------^M^M [ 0.084018] WARNING: at /build/buildd-linux_3.2.21-3-i386-vEohn4/linux-3.2.21/arch/x86/xen/enlighten.c:738 perf_events_lapic_init+0x28/0x29()^M^M [ 0.088009] Hardware name: empty^M^M [ 0.091299] Modules linked in:^M^M [ 0.092275] Pid: 1, comm: swapper/0 Not tainted 3.2.0-3-686-pae #1^M^M [ 0.096008] Call Trace:^M^M [ 0.098527] [<c1037fcc>] ? warn_slowpath_common+0x68/0x79^M^M [ 0.100019] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M [ 0.104010] [<c1037fea>] ? warn_slowpath_null+0xd/0x10^M^M [ 0.108011] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M [ 0.112015] [<c141c97e>] ? init_hw_perf_events+0x223/0x3b1^M^M [ 0.116012] [<c141c75b>] ? check_bugs+0x1d9/0x1d9^M^M [ 0.120012] [<c1003074>] ? do_one_initcall+0x66/0x10e^M^M [ 0.124012] [<c1415770>] ? kernel_init+0x6d/0x125^M^M [ 0.128012] [<c1415703>] ? start_kernel+0x325/0x325^M^M [ 0.132015] [<c12c463e>] ? kernel_thread_helper+0x6/0x10^M^M [ 0.136019] ---[ end trace a7919e7f17c0a725 ]---^M^M -- snip -- And pretty soon degenerated into log message spamming of this sort: -- snip -- (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from 0x0000000000530076 to 0x0000000000130076.^M (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from 0x0000000000530076 to 0x0000000000130076.^M (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from 0x0000000000530076 to 0x0000000000130076.^M (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from 0x0000000000530076 to 0x0000000000130076.^M (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from 0x0000000000530076 to 0x0000000000130076.^M -- snip -- The serial log is attached ("exile.log"). An earlier kernel I had lying around, 2.6.32.25 (perhaps one of Jeremy''s?) boots fine; the serial log is also attached ("exile-good.log"). It also seems ot have the WARN above, so maybe that''s not actually the issue. Any ideas? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk
2012-Aug-17 13:07 UTC
Re: Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2
On Fri, Aug 17, 2012 at 12:17:15PM +0100, George Dunlap wrote:> I just tried to install Xen-4.2.0-rc2 on a Debian wheezy system, but > couldn''t boot under Xen 4.2. The box is an 8-core AMD, I think > Barcelona. The wheezy kernel is 3.2.21-3, 32-bit version. > > The problems seem to have started here: > > -- snip -- > [ 0.060280] ACPI: Core revision 20110623^M^M > [ 0.072384] Performance Events: Broken BIOS detected, complain to > your hardware vendor.^M^M > [ 0.076014] [Firmware Bug]: the BIOS has corrupted hw-PMU resources > (MSR c0010000 is 530076)^M^M > [ 0.080007] AMD PMU driver.^M^M > [ 0.082864] ------------[ cut here ]------------^M^M > [ 0.084018] WARNING: at > /build/buildd-linux_3.2.21-3-i386-vEohn4/linux-3.2.21/arch/x86/xen/enlighten.c:738 > perf_events_lapic_init+0x28/0x29()^M^M > [ 0.088009] Hardware name: empty^M^M > [ 0.091299] Modules linked in:^M^M > [ 0.092275] Pid: 1, comm: swapper/0 Not tainted 3.2.0-3-686-pae #1^M^M > [ 0.096008] Call Trace:^M^M > [ 0.098527] [<c1037fcc>] ? warn_slowpath_common+0x68/0x79^M^M > [ 0.100019] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M > [ 0.104010] [<c1037fea>] ? warn_slowpath_null+0xd/0x10^M^M > [ 0.108011] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M > [ 0.112015] [<c141c97e>] ? init_hw_perf_events+0x223/0x3b1^M^M > [ 0.116012] [<c141c75b>] ? check_bugs+0x1d9/0x1d9^M^M > [ 0.120012] [<c1003074>] ? do_one_initcall+0x66/0x10e^M^M > [ 0.124012] [<c1415770>] ? kernel_init+0x6d/0x125^M^M > [ 0.128012] [<c1415703>] ? start_kernel+0x325/0x325^M^M > [ 0.132015] [<c12c463e>] ? kernel_thread_helper+0x6/0x10^M^M > [ 0.136019] ---[ end trace a7919e7f17c0a725 ]---^M^M > -- snip -- > > And pretty soon degenerated into log message spamming of this sort: > > -- snip -- > (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > 0x0000000000530076 to 0x0000000000130076.^M > (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > 0x0000000000530076 to 0x0000000000130076.^M > (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > 0x0000000000530076 to 0x0000000000130076.^M > (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > 0x0000000000530076 to 0x0000000000130076.^M > (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > 0x0000000000530076 to 0x0000000000130076.^M > -- snip -- > > The serial log is attached ("exile.log"). > > An earlier kernel I had lying around, 2.6.32.25 (perhaps one of > Jeremy''s?) boots fine; the serial log is also attached > ("exile-good.log"). It also seems ot have the WARN above, so maybe > that''s not actually the issue. > > Any ideas?Implement the perf framework to work with Xen''s oprofile, or make a new set of hypercalls for it. The WARN can go away - its there to remind us to get it done at some point :-(> > -George
George Dunlap
2012-Aug-17 13:18 UTC
Re: Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2
On 17/08/12 14:07, Konrad Rzeszutek Wilk wrote:> On Fri, Aug 17, 2012 at 12:17:15PM +0100, George Dunlap wrote: >> I just tried to install Xen-4.2.0-rc2 on a Debian wheezy system, but >> couldn''t boot under Xen 4.2. The box is an 8-core AMD, I think >> Barcelona. The wheezy kernel is 3.2.21-3, 32-bit version. >> >> The problems seem to have started here: >> >> -- snip -- >> [ 0.060280] ACPI: Core revision 20110623^M^M >> [ 0.072384] Performance Events: Broken BIOS detected, complain to >> your hardware vendor.^M^M >> [ 0.076014] [Firmware Bug]: the BIOS has corrupted hw-PMU resources >> (MSR c0010000 is 530076)^M^M >> [ 0.080007] AMD PMU driver.^M^M >> [ 0.082864] ------------[ cut here ]------------^M^M >> [ 0.084018] WARNING: at >> /build/buildd-linux_3.2.21-3-i386-vEohn4/linux-3.2.21/arch/x86/xen/enlighten.c:738 >> perf_events_lapic_init+0x28/0x29()^M^M >> [ 0.088009] Hardware name: empty^M^M >> [ 0.091299] Modules linked in:^M^M >> [ 0.092275] Pid: 1, comm: swapper/0 Not tainted 3.2.0-3-686-pae #1^M^M >> [ 0.096008] Call Trace:^M^M >> [ 0.098527] [<c1037fcc>] ? warn_slowpath_common+0x68/0x79^M^M >> [ 0.100019] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M >> [ 0.104010] [<c1037fea>] ? warn_slowpath_null+0xd/0x10^M^M >> [ 0.108011] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M >> [ 0.112015] [<c141c97e>] ? init_hw_perf_events+0x223/0x3b1^M^M >> [ 0.116012] [<c141c75b>] ? check_bugs+0x1d9/0x1d9^M^M >> [ 0.120012] [<c1003074>] ? do_one_initcall+0x66/0x10e^M^M >> [ 0.124012] [<c1415770>] ? kernel_init+0x6d/0x125^M^M >> [ 0.128012] [<c1415703>] ? start_kernel+0x325/0x325^M^M >> [ 0.132015] [<c12c463e>] ? kernel_thread_helper+0x6/0x10^M^M >> [ 0.136019] ---[ end trace a7919e7f17c0a725 ]---^M^M >> -- snip -- >> >> And pretty soon degenerated into log message spamming of this sort: >> >> -- snip -- >> (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from >> 0x0000000000530076 to 0x0000000000130076.^M >> (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from >> 0x0000000000530076 to 0x0000000000130076.^M >> (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from >> 0x0000000000530076 to 0x0000000000130076.^M >> (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from >> 0x0000000000530076 to 0x0000000000130076.^M >> (XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from >> 0x0000000000530076 to 0x0000000000130076.^M >> -- snip -- >> >> The serial log is attached ("exile.log"). >> >> An earlier kernel I had lying around, 2.6.32.25 (perhaps one of >> Jeremy''s?) boots fine; the serial log is also attached >> ("exile-good.log"). It also seems ot have the WARN above, so maybe >> that''s not actually the issue. >> >> Any ideas? > Implement the perf framework to work with Xen''s oprofile, or make a new > set of hypercalls for it. > > The WARN can go away - its there to remind us to get it done at some point :-(OK, but is there a way I can actually get it to boot? I think the WRMSR is probably the real problem. -George
Konrad Rzeszutek Wilk
2012-Aug-17 13:58 UTC
Re: Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2
On Fri, Aug 17, 2012 at 02:18:19PM +0100, George Dunlap wrote:> On 17/08/12 14:07, Konrad Rzeszutek Wilk wrote: > >On Fri, Aug 17, 2012 at 12:17:15PM +0100, George Dunlap wrote: > >>I just tried to install Xen-4.2.0-rc2 on a Debian wheezy system, but > >>couldn''t boot under Xen 4.2. The box is an 8-core AMD, I think > >>Barcelona. The wheezy kernel is 3.2.21-3, 32-bit version. > >> > >>The problems seem to have started here: > >> > >>-- snip -- > >>[ 0.060280] ACPI: Core revision 20110623^M^M > >>[ 0.072384] Performance Events: Broken BIOS detected, complain to > >>your hardware vendor.^M^M > >>[ 0.076014] [Firmware Bug]: the BIOS has corrupted hw-PMU resources > >>(MSR c0010000 is 530076)^M^M > >>[ 0.080007] AMD PMU driver.^M^M > >>[ 0.082864] ------------[ cut here ]------------^M^M > >>[ 0.084018] WARNING: at > >>/build/buildd-linux_3.2.21-3-i386-vEohn4/linux-3.2.21/arch/x86/xen/enlighten.c:738 > >>perf_events_lapic_init+0x28/0x29()^M^M > >>[ 0.088009] Hardware name: empty^M^M > >>[ 0.091299] Modules linked in:^M^M > >>[ 0.092275] Pid: 1, comm: swapper/0 Not tainted 3.2.0-3-686-pae #1^M^M > >>[ 0.096008] Call Trace:^M^M > >>[ 0.098527] [<c1037fcc>] ? warn_slowpath_common+0x68/0x79^M^M > >>[ 0.100019] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M > >>[ 0.104010] [<c1037fea>] ? warn_slowpath_null+0xd/0x10^M^M > >>[ 0.108011] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M > >>[ 0.112015] [<c141c97e>] ? init_hw_perf_events+0x223/0x3b1^M^M > >>[ 0.116012] [<c141c75b>] ? check_bugs+0x1d9/0x1d9^M^M > >>[ 0.120012] [<c1003074>] ? do_one_initcall+0x66/0x10e^M^M > >>[ 0.124012] [<c1415770>] ? kernel_init+0x6d/0x125^M^M > >>[ 0.128012] [<c1415703>] ? start_kernel+0x325/0x325^M^M > >>[ 0.132015] [<c12c463e>] ? kernel_thread_helper+0x6/0x10^M^M > >>[ 0.136019] ---[ end trace a7919e7f17c0a725 ]---^M^M > >>-- snip -- > >> > >>And pretty soon degenerated into log message spamming of this sort: > >> > >>-- snip -- > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^M > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^M > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^M > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^M > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^M > >>-- snip -- > >> > >>The serial log is attached ("exile.log"). > >> > >>An earlier kernel I had lying around, 2.6.32.25 (perhaps one of > >>Jeremy''s?) boots fine; the serial log is also attached > >>("exile-good.log"). It also seems ot have the WARN above, so maybe > >>that''s not actually the issue. > >> > >>Any ideas? > >Implement the perf framework to work with Xen''s oprofile, or make a new > >set of hypercalls for it. > > > >The WARN can go away - its there to remind us to get it done at some point :-( > OK, but is there a way I can actually get it to boot? I think the > WRMSR is probably the real problem.It should have no trouble booting? The WRMSR are the perf counters that are being tested (I think) Oh, maybe not. I wonder if those are the APERF? So the scheduler has some code to probe the MSRS, This git commit: d95a8d4b876b60ce8497fc3216d06823c492bba6 takes care of that. But that should show up 3.2 kernel? Not there?> > -George
Konrad Rzeszutek Wilk
2012-Aug-17 14:00 UTC
Re: Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2
On Fri, Aug 17, 2012 at 02:18:19PM +0100, George Dunlap wrote:> On 17/08/12 14:07, Konrad Rzeszutek Wilk wrote: > >On Fri, Aug 17, 2012 at 12:17:15PM +0100, George Dunlap wrote: > >>I just tried to install Xen-4.2.0-rc2 on a Debian wheezy system, but > >>couldn''t boot under Xen 4.2. The box is an 8-core AMD, I think > >>Barcelona. The wheezy kernel is 3.2.21-3, 32-bit version. > >> > >>The problems seem to have started here: > >> > >>-- snip -- > >>[ 0.060280] ACPI: Core revision 20110623^M^M > >>[ 0.072384] Performance Events: Broken BIOS detected, complain to > >>your hardware vendor.^M^M > >>[ 0.076014] [Firmware Bug]: the BIOS has corrupted hw-PMU resources > >>(MSR c0010000 is 530076)^M^M > >>[ 0.080007] AMD PMU driver.^M^M > >>[ 0.082864] ------------[ cut here ]------------^M^M > >>[ 0.084018] WARNING: at > >>/build/buildd-linux_3.2.21-3-i386-vEohn4/linux-3.2.21/arch/x86/xen/enlighten.c:738 > >>perf_events_lapic_init+0x28/0x29()^M^M > >>[ 0.088009] Hardware name: empty^M^M > >>[ 0.091299] Modules linked in:^M^M > >>[ 0.092275] Pid: 1, comm: swapper/0 Not tainted 3.2.0-3-686-pae #1^M^M > >>[ 0.096008] Call Trace:^M^M > >>[ 0.098527] [<c1037fcc>] ? warn_slowpath_common+0x68/0x79^M^M > >>[ 0.100019] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M > >>[ 0.104010] [<c1037fea>] ? warn_slowpath_null+0xd/0x10^M^M > >>[ 0.108011] [<c10150d2>] ? perf_events_lapic_init+0x28/0x29^M^M > >>[ 0.112015] [<c141c97e>] ? init_hw_perf_events+0x223/0x3b1^M^M > >>[ 0.116012] [<c141c75b>] ? check_bugs+0x1d9/0x1d9^M^M > >>[ 0.120012] [<c1003074>] ? do_one_initcall+0x66/0x10e^M^M > >>[ 0.124012] [<c1415770>] ? kernel_init+0x6d/0x125^M^M > >>[ 0.128012] [<c1415703>] ? start_kernel+0x325/0x325^M^M > >>[ 0.132015] [<c12c463e>] ? kernel_thread_helper+0x6/0x10^M^M > >>[ 0.136019] ---[ end trace a7919e7f17c0a725 ]---^M^M > >>-- snip -- > >> > >>And pretty soon degenerated into log message spamming of this sort: > >> > >>-- snip -- > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^M > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^M > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^M > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^M > >>(XEN) traps.c:2584:d0 Domain attempted WRMSR 00000000c0010000 from > >>0x0000000000530076 to 0x0000000000130076.^MSo that translates to MSR_K7_EVNTSEL0. And that should only been shown once. Is the perf trying to load the module over and over?
George Dunlap
2013-Mar-14 10:07 UTC
Re: Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2
On Fri, Aug 17, 2012 at 3:00 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:> So that translates to MSR_K7_EVNTSEL0. > > And that should only been shown once. Is the perf trying to load > the module over and over?So I''ve just tested this again with the latest wheezy kernel (3.2.0-4) but this time taking a closer look, I see this near the first instance: [ 0.072397] Performance Events: Broken BIOS detected, complain to your hardware vendor.^M [ 0.076015] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010000 is 530076)^M [ 0.080007] AMD PMU driver.^M [ 0.082861] ------------[ cut here ]------------^M [ 0.084019] WARNING: at /build/buildd-linux_3.2.39-2-i386-4VFKqr/linux-3.2.39/arch/x86/xen/enlighten.c:738 perf_events_lapic_init+0x28/0x29()^M [ 0.088009] Hardware name: empty^M [ 0.091294] Modules linked in:^M [ 0.092268] Pid: 1, comm: swapper/0 Not tainted 3.2.0-4-686-pae #1 Debian 3.2.39-2^M [ 0.096009] Call Trace:^M [ 0.098531] [<c10383c4>] ? warn_slowpath_common+0x68/0x79^M [ 0.100011] [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M [ 0.104012] [<c10383e2>] ? warn_slowpath_null+0xd/0x10^M [ 0.108011] [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M [ 0.112016] [<c1421ac7>] ? init_hw_perf_events+0x223/0x3b1^M [ 0.116012] [<c14218a4>] ? check_bugs+0x1d9/0x1d9^M [ 0.120014] [<c1003074>] ? do_one_initcall+0x66/0x10e^M [ 0.124012] [<c141a781>] ? kernel_init+0x79/0x131^M [ 0.128012] [<c141a708>] ? start_kernel+0x32a/0x32a^M [ 0.132013] [<c12c727e>] ? kernel_thread_helper+0x6/0x10^M [ 0.136020] ---[ end trace b828488e55b27a3e ]---^M [ 0.140015] ... version: 0^M [ 0.144011] ... bit width: 48^M [ 0.148012] ... generic registers: 4^M [ 0.152011] ... value mask: 0000ffffffffffff^M [ 0.156013] ... max period: 00007fffffffffff^M [ 0.160012] ... fixed-purpose events: 0^M [ 0.164013] ... event mask: 000000000000000f^M [ 0.168276] NMI watchdog enabled, takes one hw-pmu counter.^M (XEN) traps.c:2495:d0 Domain attempted WRMSR 00000000c0010004 from 0x0000ffff9af0c3ec to 0x0000fffb5adce6f0. So relating this back to the discussion about vpmu for guests, it looks like maybe it''s testing the performance counters, detecting that they''re broken, but for some reason not actually disabling the NMI watchdog, and keeps on using them? -George
George Dunlap
2013-Mar-14 15:44 UTC
Re: Failure to boot default Debian wheezy (pvops) kernel on 4.2-rc2
On Thu, Mar 14, 2013 at 10:07 AM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> On Fri, Aug 17, 2012 at 3:00 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: >> So that translates to MSR_K7_EVNTSEL0. >> >> And that should only been shown once. Is the perf trying to load >> the module over and over? > > So I''ve just tested this again with the latest wheezy kernel (3.2.0-4) > but this time taking a closer look, I see this near the first > instance: > > > [ 0.072397] Performance Events: Broken BIOS detected, complain to > your hardware vendor.^M > [ 0.076015] [Firmware Bug]: the BIOS has corrupted hw-PMU resources > (MSR c0010000 is 530076)^M > [ 0.080007] AMD PMU driver.^M > [ 0.082861] ------------[ cut here ]------------^M > [ 0.084019] WARNING: at > /build/buildd-linux_3.2.39-2-i386-4VFKqr/linux-3.2.39/arch/x86/xen/enlighten.c:738 > perf_events_lapic_init+0x28/0x29()^M > [ 0.088009] Hardware name: empty^M > [ 0.091294] Modules linked in:^M > [ 0.092268] Pid: 1, comm: swapper/0 Not tainted 3.2.0-4-686-pae #1 > Debian 3.2.39-2^M > [ 0.096009] Call Trace:^M > [ 0.098531] [<c10383c4>] ? warn_slowpath_common+0x68/0x79^M > [ 0.100011] [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M > [ 0.104012] [<c10383e2>] ? warn_slowpath_null+0xd/0x10^M > [ 0.108011] [<c101536e>] ? perf_events_lapic_init+0x28/0x29^M > [ 0.112016] [<c1421ac7>] ? init_hw_perf_events+0x223/0x3b1^M > [ 0.116012] [<c14218a4>] ? check_bugs+0x1d9/0x1d9^M > [ 0.120014] [<c1003074>] ? do_one_initcall+0x66/0x10e^M > [ 0.124012] [<c141a781>] ? kernel_init+0x79/0x131^M > [ 0.128012] [<c141a708>] ? start_kernel+0x32a/0x32a^M > [ 0.132013] [<c12c727e>] ? kernel_thread_helper+0x6/0x10^M > [ 0.136020] ---[ end trace b828488e55b27a3e ]---^M > [ 0.140015] ... version: 0^M > [ 0.144011] ... bit width: 48^M > [ 0.148012] ... generic registers: 4^M > [ 0.152011] ... value mask: 0000ffffffffffff^M > [ 0.156013] ... max period: 00007fffffffffff^M > [ 0.160012] ... fixed-purpose events: 0^M > [ 0.164013] ... event mask: 000000000000000f^M > [ 0.168276] NMI watchdog enabled, takes one hw-pmu counter.^M > (XEN) traps.c:2495:d0 Domain attempted WRMSR 00000000c0010004 from > 0x0000ffff9af0c3ec to 0x0000fffb5adce6f0. > > So relating this back to the discussion about vpmu for guests, it > looks like maybe it''s testing the performance counters, detecting that > they''re broken, but for some reason not actually disabling the NMI > watchdog, and keeps on using them?I''m guessing that the problem is in arch/x86/kernel/cpu/perf_events.c:check_hw_exits(). It has two failures modes -- "bios_fail" and "msr_fail". It does that check where it tries to write and then read the perfcounter MSRs to see if they''re functional; if that fails it will go to msr_fail and return false. However, *before* it does that check, it does some other checks which, if they fail, will jump right to bios_fail, missing that check out entirely. Really the "goto bios_fail" is wrong in all sorts of ways -- e.g., in the first loop, if it detects that condition early on, it will entirely miss other MSR checks. I might just propose a complete rewrite of that function... -George