Ke, Liping
2009-Mar-20 05:02 UTC
[Xen-devel] [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN
Hi, Keir The patches are for MCA enabling in XEN. Those patches based on AMD and SUN''s MCA related jobs. We have some discussions with AMD/SUN and did refinements from the last sending. Also we rebase it after SUN''s latest improvements. We will have following patches for recovery actions. This is a basic framework for Intel MCA. Some implementation notes: 1) When error happens, if the error is fatal (pcc = 1) or can''t be recovered (pcc = 0, yet no good recovery methods), for avoiding losing logs in DOM0, we will reset machine immediately. Most of MCA MSRs are sticky. After reboot, MCA polling mechanism will send vIRQ to DOM0 for logging. 2) When MCE# happens, all CPUs enter MCA context. The first CPU who read&clear the error MSR bank will be this MCE# owner. Necessary locks/synchronization will help to judge the owner and select most severe error. 3) For convenience, we will select the most offending CPU to do most of processing&recovery job. 4) MCE# happens, we will do three jobs: a. Send vIRQ to DOM0 for logging b. Send vMCE# to Impacted Guest (Currently Only inject to impacted DOM0) c. Guest vMCE MSR virtualization 5) Some further improvement/adds for newer CPUs might be done later a) Connection with recovery actions (cpu/memory online/offline) b) More software-recovery identification in severity_scan c) More refines and tests for HVM might be done when needed. For discussion details between amd/sun: please refer to the mail thread: http://lists.xensource.com/archives/html/xen-devel/2009-02/msg00509.html Patch Description: 1. intel_mce_base: Basic MCA enabling support For Intel. 2. vmsr_virtualization: Guest MCE# MSR read/write virtualization support in XEN. 3. interface: xen/dom0 interface, let DOM0 know the recovery details in XEN For interface discussion details, please refer to the mail thread: http://lists.xensource.com/archives/html/xen-devel/2009-03/msg00322.html About Test: We did some internal test and the result is just fine. Any problem, just let me know. Thanks a lot for your help! Regards, Criping _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Mar-20 23:46 UTC
Re: [Xen-devel] [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN
Ke, Liping wrote:> The patches are for MCA enabling in XEN. Those patches based on AMD and SUN''s MCA related jobs. > We have some discussions with AMD/SUN and did refinements from the last sending. Also we rebase it after > SUN''s latest improvements. We will have following patches for recovery actions. This is a basic framework > for Intel MCA.I looked the patches over a little more closely, and merged them with my -unstable tree. I found a few minor issues: * some compile issues with printk format strings in the case of DEBUG and 32bit * in severity_scan, use mca_rdmsrl and mca_wrmsrl to work correctly for simulated errors using injection * in severity_scan, if the MSR values were injected for debugging purposes, don''t panic but keep going, since the injected values will be lost at reboot, and this is just a simulated #MC anyway, there is no danger of losing state I''ll attach a little patch to fix these issues. I haven''t tested this patch yet, although the compile fixes have been "tested". Finally, one final question:> 2) When MCE# happens, all CPUs enter MCA context. The first CPU who read&clear the error MSR bank will be this > MCE# owner. Necessary locks/synchronization will help to judge the owner and select most severe error.Is it always true (at least, for Intel CPUs of family 6 and 15) that when a #MC happens, *all* CPUs will receive a #MC trap? I couldn''t find this anywhere in the documentation. If this is true, I''ll change the MCE injection code to simulate #MC on all CPUs in the case of an Intel system. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Mar-20 23:48 UTC
Re: [Xen-devel] [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN
Forgot to attach the patch with the minor fixes.. here it is. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-21 05:13 UTC
Re: [Xen-devel] [Patch 0/3]RAS(Part II)--Intel MCA enalbing in XEN
I already did some fixing up. Please send a patch with your remaining fixes. It probably won''t get applied until I get back from holiday, however. -- Keir On 20/03/2009 23:48, "Frank van der Linden" <Frank.Vanderlinden@Sun.COM> wrote:> Forgot to attach the patch with the minor fixes.. here it is. > > - Frank_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel