Hi, When I launch memtest as HVM guest, then Xen sends tons of VIRQ_MCA events to the Dom0, although there occured NO correctable machine check errors. When the Dom0 tries to fetch the error telemetry, then the BUG_ON(mc_data.fetch_idx > mc_data.error_idx); in x86_mcinfo_getfetchptr() in xen/arch/x86/cpu/mcheck/mce.c is hit. (x86_mcinfo_getfetchptr() only works if actually real error occured which is not the case.) This looks to me, there''s a non-public event channel using the same number as VIRQ_MCA which fires when launching memtest as HVM guest. Christoph -- AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/8/08 12:48, "Christoph Egger" <Christoph.Egger@amd.com> wrote:> When I launch memtest as HVM guest, then Xen sends tons of VIRQ_MCA events > to the Dom0, although there occured NO correctable machine check errors. > When the Dom0 tries to fetch the error telemetry, then the > > BUG_ON(mc_data.fetch_idx > mc_data.error_idx); in x86_mcinfo_getfetchptr() > in xen/arch/x86/cpu/mcheck/mce.c is hit. (x86_mcinfo_getfetchptr() only works > if actually real error occured which is not the case.)Perhaps you should be more wary of hypercall inputs? Failing the hypercall, perhaps with a warning printk, would be better than BUG_ON() I think.> This looks to me, there''s a non-public event channel using the same number > as VIRQ_MCA which fires when launching memtest as HVM guest.I don''t think this is the case. Sounds easy to repro this issue though. I''ll give it a go. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/8/08 13:27, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> This looks to me, there''s a non-public event channel using the same number >> as VIRQ_MCA which fires when launching memtest as HVM guest. > > I don''t think this is the case. Sounds easy to repro this issue though. I''ll > give it a go.I can boot a memtest-3.4 ISO in an HVM guest on PAE hypervisor just fine. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2008-Aug-13 12:40 UTC
Re: [Xen-devel] Re: machine check report on HVM startup
On Wednesday 13 August 2008 14:36:21 Keir Fraser wrote:> On 13/8/08 13:27, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >> This looks to me, there''s a non-public event channel using the same > >> number as VIRQ_MCA which fires when launching memtest as HVM guest. > > > > I don''t think this is the case. Sounds easy to repro this issue though. > > I''ll give it a go. > > I can boot a memtest-3.4 ISO in an HVM guest on PAE hypervisor just fine.Does your Dom0 kernel registrate the machine check event handler ? If not, then it things go fine. If yes, then you should see the flood of VIRQ_MCA events in the Dom0. Christoph -- AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/8/08 13:40, "Christoph Egger" <Christoph.Egger@amd.com> wrote:>>>> This looks to me, there''s a non-public event channel using the same >>>> number as VIRQ_MCA which fires when launching memtest as HVM guest. >>> >>> I don''t think this is the case. Sounds easy to repro this issue though. >>> I''ll give it a go. >> >> I can boot a memtest-3.4 ISO in an HVM guest on PAE hypervisor just fine. > > Does your Dom0 kernel registrate the machine check event handler ? > If not, then it things go fine. If yes, then you should see the flood of > VIRQ_MCA events in the Dom0.How do I make it do that? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2008-Aug-13 13:17 UTC
Re: [Xen-devel] Re: machine check report on HVM startup
On Wednesday 13 August 2008 14:48:04 Keir Fraser wrote:> On 13/8/08 13:40, "Christoph Egger" <Christoph.Egger@amd.com> wrote: > >>>> This looks to me, there''s a non-public event channel using the same > >>>> number as VIRQ_MCA which fires when launching memtest as HVM guest. > >>> > >>> I don''t think this is the case. Sounds easy to repro this issue though. > >>> I''ll give it a go. > >> > >> I can boot a memtest-3.4 ISO in an HVM guest on PAE hypervisor just > >> fine. > > > > Does your Dom0 kernel registrate the machine check event handler ? > > If not, then it things go fine. If yes, then you should see the flood of > > VIRQ_MCA events in the Dom0. > > How do I make it do that?Assuming you use Linux as Dom0, apply the attached patch to your local tree. With it, you should see a flood of "xen_mca: HW reported correctable error(s)" Dom0 kernel messages. Note, the patch is not intended to go upstream. There will be something better in the future. Christoph -- AMD Saxony, Dresden, Germany Operating System Research Center Legal Information: AMD Saxony Limited Liability Company & Co. KG Sitz (Geschäftsanschrift): Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland Registergericht Dresden: HRA 4896 vertretungsberechtigter Komplementär: AMD Saxony LLC (Sitz Wilmington, Delaware, USA) Geschäftsführer der AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/8/08 13:48, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> Does your Dom0 kernel registrate the machine check event handler ? >> If not, then it things go fine. If yes, then you should see the flood of >> VIRQ_MCA events in the Dom0. > > How do I make it do that?I modified the netback VIRQ_DEBUG handler to register on VIRQ_MCA instead. I didn''t get any output from it when running a memtest HVM guest. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 13/8/08 14:17, "Christoph Egger" <Christoph.Egger@amd.com> wrote:> Assuming you use Linux as Dom0, apply the attached patch to your local tree. > With it, you should see a flood of "xen_mca: HW reported correctable error(s)" > Dom0 kernel messages. > > Note, the patch is not intended to go upstream. There will be something better > in the future.The patch won''t do much since CONFIG_X86_MCE depends on !XEN. Anyhow, I tried registering some other handler as VIRQ_MCA and it never fired for me. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel