Luke S Crawford
2010-May-07 01:45 UTC
[Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
so I get this on bootup right after detecting USB. this is using the 2.6.18.8-xen dom0 kernel. I got the same results with the 3.4.3-rc6 xen hypervisor. Ideas on what the problem might be? looking at amd_nonfatal it seems that the MCE code is in an impossible state? (XEN) Xen BUG at amd_nonfatal.c:165 (XEN) ----[ Xen-3.4.2 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e008:[<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0 (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor (XEN) rax: 0000000000000ffe rbx: ffff828c8024ff28 rcx: 0000000000000000 (XEN) rdx: c0080ffe01000000 rsi: 0000000000000413 rdi: 0000000000000000 (XEN) rbp: 000000025f13f8e0 rsp: ffff828c8024fe60 r8: ffff828c8028f800 (XEN) r9: 0000000000000000 r10: 0000000000000005 r11: 0000000000000000 (XEN) r12: 0000000000000000 r13: ffff828c80177720 r14: ffff83081fd7b190 (XEN) r15: ffff83081fd7b190 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 00000004ca4a6000 cr2: 000000000083c770 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 (XEN) Xen stack trace from rsp=ffff828c8024fe60: (XEN) 0000000000000000 c0080ffe01000000 ffff828c80221180 ffff828c8011a12c (XEN) ffff8300dfc2c060 ffff828c80221180 ffff83081fd7b198 ffff828c8011a20d (XEN) 000000024ab06880 0000000000000000 ffff828c8024ff28 ffff828c80267900 (XEN) ffff828c80266900 0000000000000000 ffff828c80221100 ffff828c801185b8 (XEN) 000000000000e008 ffff828c8024ff28 ffff828c80266900 ffff828c802215b0 (XEN) 000000025e3b7f20 ffff828c80138fcc 0000000000000000 ffff8300dfafc000 (XEN) ffff8300dfc2c000 0000000000000000 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000246 (XEN) 0000000000000008 00000000ffff8e54 0000000000000054 0000000000000000 (XEN) ffffffff802053aa 0000000000000001 0000000000000000 0000000000000001 (XEN) 0000010000000000 ffffffff802053aa 000000000000e033 0000000000000246 (XEN) ffffffff80511f50 000000000000e02b 0000000000000000 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff8300dfafc000 (XEN) Xen call trace: (XEN) [<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0 (XEN) [<ffff828c8011a12c>] execute_timer+0x2c/0x50 (XEN) [<ffff828c8011a20d>] timer_softirq_action+0xbd/0x2e0 (XEN) [<ffff828c801185b8>] do_softirq+0x58/0x80 (XEN) [<ffff828c80138fcc>] idle_loop+0x4c/0xa0 (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) Xen BUG at amd_nonfatal.c:165 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-May-07 07:14 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
Try ''no-mce'' on xen-4.0 or xen-unstable command line, or ''nomce'' on xen-3.4 command line. Looks like MCE support playing up. You probably didn''t want the MCE goop enabled anyway. :-) -- Keir On 07/05/2010 02:45, "Luke S Crawford" <lsc@prgmr.com> wrote:> > so I get this on bootup right after detecting USB. this is using the > 2.6.18.8-xen dom0 kernel. I got the same results with the 3.4.3-rc6 xen > hypervisor. > > Ideas on what the problem might be? looking at amd_nonfatal it seems that > the MCE code is in an impossible state? > > (XEN) Xen BUG at amd_nonfatal.c:165 > (XEN) ----[ Xen-3.4.2 x86_64 debug=n Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0 > (XEN) RFLAGS: 0000000000010246 CONTEXT: hypervisor > (XEN) rax: 0000000000000ffe rbx: ffff828c8024ff28 rcx: 0000000000000000 > (XEN) rdx: c0080ffe01000000 rsi: 0000000000000413 rdi: 0000000000000000 > (XEN) rbp: 000000025f13f8e0 rsp: ffff828c8024fe60 r8: ffff828c8028f800 > (XEN) r9: 0000000000000000 r10: 0000000000000005 r11: 0000000000000000 > (XEN) r12: 0000000000000000 r13: ffff828c80177720 r14: ffff83081fd7b190 > (XEN) r15: ffff83081fd7b190 cr0: 000000008005003b cr4: 00000000000006f0 > (XEN) cr3: 00000004ca4a6000 cr2: 000000000083c770 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff828c8024fe60: > (XEN) 0000000000000000 c0080ffe01000000 ffff828c80221180 ffff828c8011a12c > (XEN) ffff8300dfc2c060 ffff828c80221180 ffff83081fd7b198 ffff828c8011a20d > (XEN) 000000024ab06880 0000000000000000 ffff828c8024ff28 ffff828c80267900 > (XEN) ffff828c80266900 0000000000000000 ffff828c80221100 ffff828c801185b8 > (XEN) 000000000000e008 ffff828c8024ff28 ffff828c80266900 ffff828c802215b0 > (XEN) 000000025e3b7f20 ffff828c80138fcc 0000000000000000 ffff8300dfafc000 > (XEN) ffff8300dfc2c000 0000000000000000 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000246 > (XEN) 0000000000000008 00000000ffff8e54 0000000000000054 0000000000000000 > (XEN) ffffffff802053aa 0000000000000001 0000000000000000 0000000000000001 > (XEN) 0000010000000000 ffffffff802053aa 000000000000e033 0000000000000246 > (XEN) ffffffff80511f50 000000000000e02b 0000000000000000 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000000 ffff8300dfafc000 > (XEN) Xen call trace: > (XEN) [<ffff828c801778f9>] mce_amd_work_fn+0x1d9/0x1f0 > (XEN) [<ffff828c8011a12c>] execute_timer+0x2c/0x50 > (XEN) [<ffff828c8011a20d>] timer_softirq_action+0xbd/0x2e0 > (XEN) [<ffff828c801185b8>] do_softirq+0x58/0x80 > (XEN) [<ffff828c80138fcc>] idle_loop+0x4c/0xa0 > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) Xen BUG at amd_nonfatal.c:165 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke S Crawford
2010-May-07 21:10 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
Keir Fraser <keir.fraser@eu.citrix.com> writes:> Try ''no-mce'' on xen-4.0 or xen-unstable command line, or ''nomce'' on xen-3.4 > command line. Looks like MCE support playing up. You probably didn''t want > the MCE goop enabled anyway. :-)nomce no-mce and mce=off all appear to do nothing (I''m putting them right after kernel xen.gz) I get the same error. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-May-07 21:45 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
On 07/05/2010 22:10, "Luke S Crawford" <lsc@prgmr.com> wrote:> Keir Fraser <keir.fraser@eu.citrix.com> writes: > >> Try ''no-mce'' on xen-4.0 or xen-unstable command line, or ''nomce'' on xen-3.4 >> command line. Looks like MCE support playing up. You probably didn''t want >> the MCE goop enabled anyway. :-) > > nomce no-mce and mce=off all appear to do nothing (I''m putting them > right after kernel xen.gz) I get the same error.Ah, looks like half the MCE stuff is not even hooked up the mce boot parameter. Well, I expect Christoph Egger can help: he implemented a lot of the MCE mechanism, and especially the AMD parts. I think the mce boot parameter should, when disabled, cause the MCE feature bits to be removed from Xen''s copy of CPUID feature flags. That would easily disable all MCE logic throughout Xen. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-May-12 08:10 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
On 07/05/2010 22:45, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> On 07/05/2010 22:10, "Luke S Crawford" <lsc@prgmr.com> wrote: > >> Keir Fraser <keir.fraser@eu.citrix.com> writes:> Ah, looks like half the MCE stuff is not even hooked up the mce boot > parameter. Well, I expect Christoph Egger can help: he implemented a lot of > the MCE mechanism, and especially the AMD parts. > > I think the mce boot parameter should, when disabled, cause the MCE feature > bits to be removed from Xen''s copy of CPUID feature flags. That would easily > disable all MCE logic throughout Xen.Actually I think the attached patch should work, in conjunction with specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter. Let me know if it works okay for you. I''ve applied the patch to xen-unstable as c/s 21360. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke S Crawford
2010-May-20 09:13 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
Keir Fraser <keir.fraser@eu.citrix.com> writes:> > Ah, looks like half the MCE stuff is not even hooked up the mce boot > > parameter. Well, I expect Christoph Egger can help: he implemented a lot of > > the MCE mechanism, and especially the AMD parts. > > > > I think the mce boot parameter should, when disabled, cause the MCE feature > > bits to be removed from Xen''s copy of CPUID feature flags. That would easily > > disable all MCE logic throughout Xen. > > Actually I think the attached patch should work, in conjunction with > specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter. Let > me know if it works okay for you. > > I''ve applied the patch to xen-unstable as c/s 21360.So this patch was applied to 3.4-testing now, and it works beautifully. I can repeatably remove nomce from the command line, and i get the error. I re-add nomce to the command line, and everything works great. Thanks. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-May-20 12:54 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
On 20/05/2010 10:13, "Luke S Crawford" <lsc@prgmr.com> wrote:>> Actually I think the attached patch should work, in conjunction with >> specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter. Let >> me know if it works okay for you. >> >> I''ve applied the patch to xen-unstable as c/s 21360. > > So this patch was applied to 3.4-testing now, and it works beautifully. > I can repeatably remove nomce from the command line, and i get the error. > I re-add nomce to the command line, and everything works great.That''ll do then, until stomeone who udnerstands the MCE stuff implements a proper fix. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2010-May-20 14:01 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
On Thursday 20 May 2010 14:54:14 Keir Fraser wrote:> On 20/05/2010 10:13, "Luke S Crawford" <lsc@prgmr.com> wrote: > >> Actually I think the attached patch should work, in conjunction with > >> specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter. > >> Let me know if it works okay for you. > >> > >> I''ve applied the patch to xen-unstable as c/s 21360. > > > > So this patch was applied to 3.4-testing now, and it works beautifully. > > I can repeatably remove nomce from the command line, and i get the error. > > I re-add nomce to the command line, and everything works great. > > That''ll do then, until stomeone who udnerstands the MCE stuff implements a > proper fix.Keir: Thanks for fixing it. I am currently busy with nested virtualization. -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke S Crawford
2010-Jul-15 05:40 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
Keir Fraser <keir.fraser@eu.citrix.com> writes:> > I think the mce boot parameter should, when disabled, cause the MCE feature > > bits to be removed from Xen''s copy of CPUID feature flags. That would easily > > disable all MCE logic throughout Xen. > > Actually I think the attached patch should work, in conjunction with > specifying no-mce (4.0/unstable) or nomce (3.4) as a Xen boot parameter. Let > me know if it works okay for you. > > I''ve applied the patch to xen-unstable as c/s 21360.This still works swimmingly on xen 3.4... but I''m starting to flirt with xen 4.0/pvops and while it looks like your patch is in there, nomce, no-mce, mce=off and mce=no all appear to do nothing, and my box reboots in the same place it did before: (XEN) Panic on CPU 0: (XEN) Xen BUG at amd_nonfatal.c:162 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Jul-15 07:04 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
On 15/07/2010 06:40, "Luke S Crawford" <lsc@prgmr.com> wrote:> This still works swimmingly on xen 3.4... but I''m starting to > flirt with xen 4.0/pvops and while it looks like your patch is in there, > nomce, no-mce, mce=off and mce=no all appear to do nothing, and my box > reboots in the same place it did before: > > (XEN) Panic on CPU 0: > (XEN) Xen BUG at amd_nonfatal.c:162The bug is unavoidable with Xen 4.0.0 release. If you use tip of xen-4.0-testing.hg, or one of the 4.0.1 release candidates, then the no-mce boot parameter will do the right thing. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Luke S Crawford
2010-Jul-15 08:34 UTC
Re: [Xen-devel] (XEN) Xen BUG at amd_nonfatal.c:165 on a new amd g34 board
Keir Fraser <keir.fraser@eu.citrix.com> writes:> The bug is unavoidable with Xen 4.0.0 release. If you use tip of > xen-4.0-testing.hg, or one of the 4.0.1 release candidates, then the no-mce > boot parameter will do the right thing.Compiling xen-4.0-testing.hg and adding no-mce works great. thanks. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel