flyfan05
2010-Dec-12 07:08 UTC
[Xen-devel] CMCI exceptions happened and MCE entry state transition made Xen crashed.
Hi all, Three days ago, the server reported lots of CMCI exceptions and Xen 3.4.2 printed hundreds of "CMCI: send CMCI to DOM0 through virq" messages to the console . From the console output, Then I can see that Dom0 try to read the MSR_CAP regs by #GP trap in order to log the MCA error. I am not sure why so many CMCI happened , maybe there were some thing wrong with the hardware. But unfortunately the server crashed at the end. The Xen BUG ON at mctelem_append_processing() -> MCTE_TRANSITION_STATE(tep, COMMITTED, PROCESSING) -> BUG_ON(MCTE_STATE(tep) != (MCTE_F_STATE_##old)); The output of the console is like this: (XEN) Xen bug on at mctelem.c : Line 437 Why the state of the entry is not correct ? Some one change that unexpected? If any body even resolve this kind problems, Pls do me a favor. --Van _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
flyfan05
2010-Dec-12 07:11 UTC
[Xen-users] CMCI exceptions happened and MCE entry state transition made Xen crashed.
Hi all, Three days ago, the server reported lots of CMCI exceptions and Xen 3.4.2 printed hundreds of "CMCI: send CMCI to DOM0 through virq" messages to the console . From the console output, Then I can see that Dom0 try to read the MSR_CAP regs by #GP trap in order to log the MCA error. I am not sure why so many CMCI happened , maybe there were some thing wrong with the hardware. But unfortunately the server crashed at the end. The Xen BUG ON at mctelem_append_processing() -> MCTE_TRANSITION_STATE(tep, COMMITTED, PROCESSING) -> BUG_ON(MCTE_STATE(tep) != (MCTE_F_STATE_##old)); The output of the console is like this: (XEN) Xen bug on at mctelem.c : Line 437 Why the state of the entry is not correct ? Some one change that unexpected? If any body even resolve this kind problems, Pls do me a favor. --Van 网易163/126邮箱百分百兼容iphone ipad邮件收发 _______________________________________________ Xen-users mailing list Xen-users@lists.xensource.com http://lists.xensource.com/xen-users
Keir Fraser
2010-Dec-13 09:53 UTC
Re: [Xen-devel] CMCI exceptions happened and MCE entry state transition made Xen crashed.
Sounds like your system probably has a bad memory DIMM. If the MCE logic is causign problems you can turn it off with mce=0 Xen boot parameter. This will cause all these correctable errors to be ignored rather than logged; and uncorrectable errors will cause immediate hypervisor stop-and-crash rather than have dom0 attempt to fix up. K. On 12/12/2010 07:08, "flyfan05" <flyfan05@163.com> wrote:> Hi all, > Three days ago, the server reported lots of CMCI exceptions and Xen 3.4.2 > printed hundreds of "CMCI: send CMCI to DOM0 through virq" messages to the > console . From the console output, Then I can see that Dom0 try to read the > MSR_CAP regs by #GP trap in order to log the MCA error. > > I am not sure why so many CMCI happened , maybe there were some thing wrong > with the hardware. But unfortunately the server crashed at the end. The Xen > BUG ON at > mctelem_append_processing() > -> MCTE_TRANSITION_STATE(tep, COMMITTED, PROCESSING) > -> BUG_ON(MCTE_STATE(tep) != (MCTE_F_STATE_##old)); > The output of the console is like this: > (XEN) Xen bug on at mctelem.c : Line 437 > > Why the state of the entry is not correct ? Some one change that unexpected? > If any body even resolve this kind problems, Pls do me a favor. > > --Van > > > 网易163/126邮箱百分百兼容iphone ipad邮件收发 > <http://help.163.com/special/007525G0/163mail_guide.html?id=2716> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Possibly Parallel Threads
- CMCI exceptions happened and MCE entry state transition made Xen crashed.
- FW: [patch 0/4]Enable CMCI (Corrected Machine Check Error Interrupt) for Intel CPUs
- [patch 3/3]Enable CMCI (Corrected Machine Check Error Interrupt) for Intel CPUs
- Xen-3.x fix pagefault in cmci handler
- [patch 1/4]Enable CMCI (Corrected Machine Check Error Interrupt) for Intel CPUs