Ashwin Pankaj
2010-Feb-15 14:19 UTC
[Xen-devel] Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ?
Hi , I am using Xen 3.4.1 - I see that sometimes when an MCE error occurs Xen panics due to a page fault with the following stack trace- http://pastebin.com/f30f67342 After some digging, probable culprit seems to be smp_cmci_interrupt> if (bs.errcnt && mctc != NULL) { > if (guest_enabled_event(dom0->vcpu[0], > <------------------------------------ here > VIRQ_MCA)) { > mctelem_commit(mctc); > printk(KERN_DEBUG "CMCI: send CMCI to DOM0 through virq\n"); > send_guest_global_virq(dom0, VIRQ_MCA); > } else { > x86_mcinfo_dump(mctelem_dataptr(mctc)); > mctelem_dismiss(mctc); > }Looks like dom0 is NULL here ( vcpu[0] offset is 0x468). Is this possible? Other functions like mce_softirq() perform a NULL check on dom0 before accessing it''s members ....> /* Step2: Send Log to DOM0 through vIRQ */ > if (dom0 && guest_enabled_event(dom0->vcpu[0], VIRQ_MCA)) { > printk(KERN_DEBUG "MCE: send MCE# to DOM0 through virq\n"); > send_guest_global_virq(dom0, VIRQ_MCA); > }Also note that, this system printed the MCE warning message( "(XEN) MCE: The hardware reports a non fatal, correctable incident occured on CPU 0" ) twice before panicing. So this code worked properly and entered x86_mcinfo_dump() atleast twice before panic. - Regards, Ashwin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Feb-16 09:05 UTC
Re: [Xen-devel] Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ?
>>> Ashwin Pankaj <ashwin.pankaj@lsi.com> 15.02.10 15:19 >>> > After some digging, probable culprit seems to be smp_cmci_interrupt > >> if (bs.errcnt && mctc != NULL) { >> if (guest_enabled_event(dom0->vcpu[0], >> <------------------------------------ here >> VIRQ_MCA)) { >> mctelem_commit(mctc); >> printk(KERN_DEBUG "CMCI: send CMCI to DOM0 through virq\n"); >> send_guest_global_virq(dom0, VIRQ_MCA); >> } else { >> x86_mcinfo_dump(mctelem_dataptr(mctc)); >> mctelem_dismiss(mctc); >> } > > >Looks like dom0 is NULL here ( vcpu[0] offset is 0x468). Is this possible?Yes, your call trace confirms this.>Other functions like mce_softirq() perform a NULL check on dom0 before >accessing it''s members ....The majority of uses doesn''t seem to do that check, yet it is essential if CMCIs occur during boot of Xen. Even more, it should not only be dom0 that is checked against NULL, but also dom0->vcpu (or dom0->max_vcpus) and dom0->vcpu[0]. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Feb-16 10:37 UTC
RE: [Xen-devel] Xen PANIC in MCE interrupt context : can global variable dom0 be NULL ?
Anshiwin and Jan, thanks for pointing this out. As all our developer/test machine is off during the Chinese New Year Holiday. I can''t access any system now ( not even run a vim ) Jan, as the error is quite straightfoward, can you please cook a patch for it (I can''t even have a smoking testing if I cook a patch) ? If needed, I will verify it after the CNY. Thanks --jyh>-----Original Message----- >From: Jan Beulich [mailto:JBeulich@novell.com] >Sent: Tuesday, February 16, 2010 5:06 PM >To: Jiang, Yunhong; Ashwin Pankaj >Cc: Xen-devel@lists.xensource.com >Subject: Re: [Xen-devel] Xen PANIC in MCE interrupt context : can global variable >dom0 be NULL ? > >>>> Ashwin Pankaj <ashwin.pankaj@lsi.com> 15.02.10 15:19 >>> >> After some digging, probable culprit seems to be smp_cmci_interrupt >> >>> if (bs.errcnt && mctc != NULL) { >>> if (guest_enabled_event(dom0->vcpu[0], >>> <------------------------------------ here >>> VIRQ_MCA)) { >>> mctelem_commit(mctc); >>> printk(KERN_DEBUG "CMCI: send CMCI to DOM0 through >virq\n"); >>> send_guest_global_virq(dom0, VIRQ_MCA); >>> } else { >>> x86_mcinfo_dump(mctelem_dataptr(mctc)); >>> mctelem_dismiss(mctc); >>> } >> >> >>Looks like dom0 is NULL here ( vcpu[0] offset is 0x468). Is this possible? > >Yes, your call trace confirms this. > >>Other functions like mce_softirq() perform a NULL check on dom0 before >>accessing it''s members .... > >The majority of uses doesn''t seem to do that check, yet it is essential >if CMCIs occur during boot of Xen. Even more, it should not only be >dom0 that is checked against NULL, but also dom0->vcpu (or >dom0->max_vcpus) and dom0->vcpu[0]. > >Jan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel