Hi, all These patches are for MCA enabling in XEN. It is sent as RFC firstly to collect some feedbacks for refinement if needed before the final patch. We also attach one description txt documents for your reference. Some implementation notes: 1) When error happens, if the error is fatal (pcc = 1) or can''t be recovered (pcc = 0, yet no good recovery methods), for avoiding losing logs in DOM0, we will reset machine immediately. Most of MCA MSRs are sticky. After reboot, MCA polling mechanism will send vIRQ to DOM0 for logging. 2) When MCE# happens, all CPUs enter MCA context. The first CPU who read&clear the error MSR bank will be this MCE# owner. Necessary locks/synchronization will help to judge the owner and select most severe error. 3) For convenience, we will select the most offending CPU to do most of processing&recovery job. 4) MCE# happens, we will do three jobs: a. Send vIRQ to DOM0 for logging b. Send vMCE# to Impacted Guest (Currently Only inject to impacted DOM0) c. Guest vMCE MSR virtualization 5) Some further improvement/adds might be done if needed: a) Impacted DOM judgement algorithm. b) Now vMCE# injection is controlled by centralized data(vmce_data). The injection algorithm is a bit complex. We might change the algorithm which''s based on PER_DOM data if you preferred. Notes for understanding: 1) If several banks impact one domain, yet those banks belong to the same pCPU, it will be injected only once. 2) If more than one bank impact one domain, yet error banks belong to different pCPU, ith will be injected nr_num(pCPU) times. 3) We use centralized data [two arrays impact_domid, impact_cpus map in vmce_data] to represent the injection algorithm. Combined the two array item (idx, impact_domid) and (idx, impact_cpus) into one item (idx, impact_domid, impact_cpus). This item records the impact_domain id and the error pCPU map (Finding UC errors on this CPU which impact this domain). Then, we can judge how to inject the vMCE (domid, impact_times[nr_pCPUs]). 4) Although data structure is ready, we only inject vMCE# to DOMD0 currently. c) Connection with recovery actions (cpu/memory online/offline) d) More refines and tests for HVM might be done when needed. Patch Description: 1. basic_mca_support: Enable MCA support in XEN. 2. vmsr_virtualization: Guest MCE# MSR read/write virtualization support in XEN. 3. mce_dom0: Cooperating with XEN, DOM0 add vIRQ and vMCE# handler. Translate XEN log to DOM0, re-use Linux kernel and MCELOG mechanisms and MCE handler. This is mainly a demonstration patch. About Test: We did some internal test and the result is just fine. Any feedback is welcome and thanks a lot for your help! :-) Regards, Criping _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Feb-16 13:34 UTC
[Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
To me, it seems, the design has not been understood and now, the code becomes more and more unmaintainable bloat. I mean, the code is going to do far too much. - The MCE routines in Xen are only for error data *collection*. Just pass it to Dom0 and that''s it. Dom0 will do the error analysis and figure out what do to. It is the Dom0 which will do a hypercall to do things like page-offlining or cpu offlining or whatever is needed. Your code tries to move everyting back from Dom0 into the hypervisor. I remember Keir having rejected my MCE patches because he feared this bloat. - Dom0 VIRQ is for correctable errors only. Uncorrectable errors are delivered via MCE trap. Dom0 and DomU register a handle via set_trap_table hypercall. A non-registrated handler means, the guest can''t handle it by itself. Dom0 is always notified, the guest becomes only notified This seperation is completely ignored and misuse Dom0 VIRQ for everything (therefore the bunch of superflous flags (see next point)) - MCA flags: what are the differences between correctable and recoverable ? what are the differences between uncorrectable, polled, reset and cmci and mce types ? - You use dynamic memory allocation (which uses spinlocks) in MCE code and you roll your own mce handling instead of using the generic API in mce.c I suppose, you don''t understand it at all. - I attach the design document again, since I have the impression, noone at Intel read it, hence the misunderstandings. I think, it is best to get Gavin''s generic mce improvements upstream first. On Monday 16 February 2009 06:35:14 Ke, Liping wrote:> Hi, all > These patches are for MCA enabling in XEN. It is sent as RFC firstly to > collect some feedbacks for refinement if needed before the final patch. We > also attach one description txt documents for your reference. > > Some implementation notes: > 1) When error happens, if the error is fatal (pcc = 1) or can''t be > recovered (pcc = 0, yet no good recovery methods), for avoiding losing logs > in DOM0, we will reset machine immediately. Most of MCA MSRs are sticky. > After reboot, MCA polling mechanism will send vIRQ to DOM0 for logging. > 2) When MCE# happens, all CPUs enter MCA context. The first CPU who > read&clear the error MSR bank will be this MCE# owner. Necessary > locks/synchronization will help to judge the owner and select most severe > error. 3) For convenience, we will select the most offending CPU to do most > of processing&recovery job. 4) MCE# happens, we will do three jobs: > a. Send vIRQ to DOM0 for logging > b. Send vMCE# to Impacted Guest (Currently Only inject to impacted > DOM0) c. Guest vMCE MSR virtualization > 5) Some further improvement/adds might be done if needed: > a) Impacted DOM judgement algorithm. > b) Now vMCE# injection is controlled by centralized data(vmce_data). > The injection algorithm is a bit complex. We might change the algorithm > which''s based on PER_DOM data if you preferred. Notes for understanding: > 1) If several banks impact one domain, yet those banks belong to > the same pCPU, it will be injected only once. 2) If more than one bank > impact one domain, yet error banks belong to different pCPU, ith will be > injected nr_num(pCPU) times. 3) We use centralized data [two arrays > impact_domid, impact_cpus map in vmce_data] to represent the injection > algorithm. Combined the two array item (idx, impact_domid) and (idx, > impact_cpus) into one item (idx, impact_domid, impact_cpus). This item > records the impact_domain id and the error pCPU map (Finding UC errors on > this CPU which impact this domain). Then, we can judge how to inject the > vMCE (domid, impact_times[nr_pCPUs]). > 4) Although data structure is ready, we only inject vMCE# to DOMD0 > currently. c) Connection with recovery actions (cpu/memory online/offline) > d) More refines and tests for HVM might be done when needed. > > Patch Description: > 1. basic_mca_support: Enable MCA support in XEN. > 2. vmsr_virtualization: Guest MCE# MSR read/write virtualization support in > XEN. 3. mce_dom0: Cooperating with XEN, DOM0 add vIRQ and vMCE# handler. > Translate XEN log to DOM0, re-use Linux kernel and MCELOG mechanisms and > MCE handler. This is mainly a demonstration patch. > > About Test: > We did some internal test and the result is just fine. > > Any feedback is welcome and thanks a lot for your help! :-) > Regards, > Criping-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Feb-16 14:18 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
I realize from this and earlier MCE patches from Intel, that Intel tries to change the machine check design on its ground. The basic ideas behind current design: 1. Xen collects error telemetry 2. Xen delivers correctable errors to Dom0 via VIRQ 3. Xen delivers uncorrectable errors to Dom0 via trap handler 4. Xen delivers uncorrectable errors to DomU only if Dom0 tells Xen to do so 5. Xen performs health measurements as told by Dom0 via hypercalls such as cpu- or page-offlining 6. Dom0 performs error analysis, figures out what is going on, calls hypercalls for the right health measurement The basic ideas behind Intel''s new design (as far as I can see them from their patches I have seen so far): 1. Xen collects error telemetry 2. Xen performs error analysis, figures out what is going on 3. Xen automatically does health measurements automatically like cpu- and page-offlining 4. Xen delivers error telemetry to Dom0 via VIRQ for error logging only independent of the error type 5. Inject MCEs into the guest directly 6. Don''t use the MCE trap handler at all IMO, any design change should be discussed first and not changed silently, since this will confuse everyone and noone will know what is the right thing to do in Xen and in Dom0 and this in turn will lead to error prone, unmaintainable code in both Xen and in Dom0 Christoph On Monday 16 February 2009 14:34:36 Christoph Egger wrote:> To me, it seems, the design has not been understood > and now, the code becomes more and more unmaintainable > bloat. I mean, the code is going to do far too much. > > - The MCE routines in Xen are only for error data *collection*. > Just pass it to Dom0 and that''s it. > Dom0 will do the error analysis and figure out what do to. > It is the Dom0 which will do a hypercall to do things like > page-offlining or cpu offlining or whatever is needed. > Your code tries to move everyting back from Dom0 into the > hypervisor. I remember Keir having rejected my MCE patches > because he feared this bloat. > > - Dom0 VIRQ is for correctable errors only. Uncorrectable errors > are delivered via MCE trap. Dom0 and DomU register a handle > via set_trap_table hypercall. A non-registrated handler means, > the guest can''t handle it by itself. Dom0 is always notified, > the guest becomes only notified > This seperation is completely ignored and misuse Dom0 VIRQ for everything > (therefore the bunch of superflous flags (see next point)) > > - MCA flags: what are the differences between correctable > and recoverable ? what are the differences between uncorrectable, > polled, reset and cmci and mce types ? > > - You use dynamic memory allocation (which uses spinlocks) in MCE code > and you roll your own mce handling instead of using the generic API in > mce.c I suppose, you don''t understand it at all. > > - I attach the design document again, since I have the impression, noone > at Intel read it, hence the misunderstandings. > > I think, it is best to get Gavin''s generic mce improvements upstream first. > > On Monday 16 February 2009 06:35:14 Ke, Liping wrote: > > Hi, all > > These patches are for MCA enabling in XEN. It is sent as RFC firstly to > > collect some feedbacks for refinement if needed before the final patch. > > We also attach one description txt documents for your reference. > > > > Some implementation notes: > > 1) When error happens, if the error is fatal (pcc = 1) or can''t be > > recovered (pcc = 0, yet no good recovery methods), for avoiding losing > > logs in DOM0, we will reset machine immediately. Most of MCA MSRs are > > sticky. After reboot, MCA polling mechanism will send vIRQ to DOM0 for > > logging. 2) When MCE# happens, all CPUs enter MCA context. The first CPU > > who read&clear the error MSR bank will be this MCE# owner. Necessary > > locks/synchronization will help to judge the owner and select most severe > > error. 3) For convenience, we will select the most offending CPU to do > > most of processing&recovery job. 4) MCE# happens, we will do three jobs: > > a. Send vIRQ to DOM0 for logging > > b. Send vMCE# to Impacted Guest (Currently Only inject to impacted > > DOM0) c. Guest vMCE MSR virtualization > > 5) Some further improvement/adds might be done if needed: > > a) Impacted DOM judgement algorithm. > > b) Now vMCE# injection is controlled by centralized data(vmce_data). > > The injection algorithm is a bit complex. We might change the algorithm > > which''s based on PER_DOM data if you preferred. Notes for understanding: > > 1) If several banks impact one domain, yet those banks belong to > > the same pCPU, it will be injected only once. 2) If more than one bank > > impact one domain, yet error banks belong to different pCPU, ith will be > > injected nr_num(pCPU) times. 3) We use centralized data [two arrays > > impact_domid, impact_cpus map in vmce_data] to represent the injection > > algorithm. Combined the two array item (idx, impact_domid) and (idx, > > impact_cpus) into one item (idx, impact_domid, impact_cpus). This item > > records the impact_domain id and the error pCPU map (Finding UC errors on > > this CPU which impact this domain). Then, we can judge how to inject the > > vMCE (domid, impact_times[nr_pCPUs]). > > 4) Although data structure is ready, we only inject vMCE# to > > DOMD0 currently. c) Connection with recovery actions (cpu/memory > > online/offline) d) More refines and tests for HVM might be done when > > needed. > > > > Patch Description: > > 1. basic_mca_support: Enable MCA support in XEN. > > 2. vmsr_virtualization: Guest MCE# MSR read/write virtualization support > > in XEN. 3. mce_dom0: Cooperating with XEN, DOM0 add vIRQ and vMCE# > > handler. Translate XEN log to DOM0, re-use Linux kernel and MCELOG > > mechanisms and MCE handler. This is mainly a demonstration patch. > > > > About Test: > > We did some internal test and the result is just fine. > > > > Any feedback is welcome and thanks a lot for your help! :-) > > Regards, > > Criping-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Feb-16 15:03 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On 16/02/2009 14:18, "Christoph Egger" <Christoph.Egger@amd.com> wrote:> IMO, any design change should be discussed first and not changed > silently, since this will confuse everyone and noone will know > what is the right thing to do in Xen and in Dom0 and this > in turn will lead to error prone, unmaintainable code in both > Xen and in Dom0I certainly think we should have a shared approach for x86 machine-check handling, rather than completely different architectures for AMD and Intel. Fortunately Sun are an interested and active third party regarding this feature. I''ll be interested in their opinion. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-16 15:05 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Aha, Christoph, sorry for the suprise to you, but I think we have descript our suggestion to you (refert to http://markmail.org/message/vpcdojylxkrg6uz3 please). As I didn't get any response from your side, so I suppose you are waiting for the patch to get more idea, that's the reason Criping and I hurry up to cook the patch and send it out as RFC. The RFC means it is target for comments, as we know MCA handling is complex and need community discussion (I have to say sometimes patch is more clear than design doc, although cooking a patch need more effort). Your description of our design is quite clear, that also means our RFC has achieved it's purpose :-) One exception is item 6, MCE trap handler in HV side is still needed for PV domain just as it is now (the bounce buffer, the trap priority etc), but for guest, yes, we try to re-use guest's MCA handler. As said already, MCE handling is complex, so can we discuss it in details on how to handle the MCA and get some consensus ? We have CC'ed all engineers we think may be interesting on it. I merge comments to your another mail as below:>- The MCE routines in Xen are only for error data *collection*. > Just pass it to Dom0 and that's it. > Dom0 will do the error analysis and figure out what do to. > It is the Dom0 which will do a hypercall to do things like > page-offlining or cpu offlining or whatever is needed. > Your code tries to move everyting back from Dom0 into the > hypervisor. I remember Keir having rejected my MCE patches > because he feared this bloat.Sorry that I didn't notice Keir's feedback to your original patch, I will google it, or it will be great if you can share me when that happen?>- MCA flags: what are the differences between correctable > and recoverable ? what are the differences between uncorrectable, > polled, reset and cmci and mce types ?Per my understanding, correctable error (sometimes it is called corrected error) means hardware have recovered the error and software is not impacted (although some proactive action is prefered), while recoverable means hardware does not recover the error but it is possible that softeare can recover the error (it is sometihng like non-fatal error in PCI-E spec, although not exactly same, I think).> >- You use dynamic memory allocation (which uses spinlocks) in MCE code > and you roll your own mce handling instead of using the >generic API in mce.cI think that is in softIRQ context and should be ok for spinlocks.> I suppose, you don't understand it at all. > >- I attach the design document again, since I have the >impression, noone > at Intel read it, hence the misunderstandings.I promise we read it carefully, otherwise my manager is sure to challenge me firstly before you, and it is really good written.> >I think, it is best to get Gavin's generic mce improvements >upstream first.Sure, Gavin's improvement is important. Again, this patch is just a RFC, and some components is still WIP like inject per-domain MCA since we want to get input firstly. Thanks Yunhong Jiang>-----Original Message----- >From: Christoph Egger [mailto:Christoph.Egger@amd.com] >Sent: 2009年2月16日 22:18 >To: xen-devel@lists.xensource.com >Cc: Ke, Liping; Frank.Vanderlinden@Sun.COM; Jiang, Yunhong; >Keir Fraser; Gavin Maltby >Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN > > >I realize from this and earlier MCE patches from Intel, >that Intel tries to change the machine check design >on its ground. > >The basic ideas behind current design: > >1. Xen collects error telemetry >2. Xen delivers correctable errors to Dom0 via VIRQ >3. Xen delivers uncorrectable errors to Dom0 via trap handler >4. Xen delivers uncorrectable errors to DomU only if Dom0 >tells Xen to do so >5. Xen performs health measurements as told by Dom0 via hypercalls > such as cpu- or page-offlining >6. Dom0 performs error analysis, figures out what is going on, > calls hypercalls for the right health measurement > > >The basic ideas behind Intel's new design (as far as I can see >them from their >patches I have seen so far): > >1. Xen collects error telemetry >2. Xen performs error analysis, figures out what is going on >3. Xen automatically does health measurements automatically > like cpu- and page-offlining >4. Xen delivers error telemetry to Dom0 via VIRQ for error logging only > independent of the error type >5. Inject MCEs into the guest directly >6. Don't use the MCE trap handler at all > > >IMO, any design change should be discussed first and not changed >silently, since this will confuse everyone and noone will know >what is the right thing to do in Xen and in Dom0 and this >in turn will lead to error prone, unmaintainable code in both >Xen and in Dom0 > >Christoph > > >On Monday 16 February 2009 14:34:36 Christoph Egger wrote: >> To me, it seems, the design has not been understood >> and now, the code becomes more and more unmaintainable >> bloat. I mean, the code is going to do far too much. >> >> - The MCE routines in Xen are only for error data *collection*. >> Just pass it to Dom0 and that's it. >> Dom0 will do the error analysis and figure out what do to. >> It is the Dom0 which will do a hypercall to do things like >> page-offlining or cpu offlining or whatever is needed. >> Your code tries to move everyting back from Dom0 into the >> hypervisor. I remember Keir having rejected my MCE patches >> because he feared this bloat. >> >> - Dom0 VIRQ is for correctable errors only. Uncorrectable errors >> are delivered via MCE trap. Dom0 and DomU register a handle >> via set_trap_table hypercall. A non-registrated handler means, >> the guest can't handle it by itself. Dom0 is always notified, >> the guest becomes only notified >> This seperation is completely ignored and misuse Dom0 VIRQ >for everything >> (therefore the bunch of superflous flags (see next point)) >> >> - MCA flags: what are the differences between correctable >> and recoverable ? what are the differences between uncorrectable, >> polled, reset and cmci and mce types ? >> >> - You use dynamic memory allocation (which uses spinlocks) >in MCE code >> and you roll your own mce handling instead of using the >generic API in >> mce.c I suppose, you don't understand it at all. >> >> - I attach the design document again, since I have the >impression, noone >> at Intel read it, hence the misunderstandings. >> >> I think, it is best to get Gavin's generic mce improvements >upstream first. >> >> On Monday 16 February 2009 06:35:14 Ke, Liping wrote: >> > Hi, all >> > These patches are for MCA enabling in XEN. It is sent as >RFC firstly to >> > collect some feedbacks for refinement if needed before the >final patch. >> > We also attach one description txt documents for your reference. >> > >> > Some implementation notes: >> > 1) When error happens, if the error is fatal (pcc = 1) or can't be >> > recovered (pcc = 0, yet no good recovery methods), for >avoiding losing >> > logs in DOM0, we will reset machine immediately. Most of >MCA MSRs are >> > sticky. After reboot, MCA polling mechanism will send vIRQ >to DOM0 for >> > logging. 2) When MCE# happens, all CPUs enter MCA context. >The first CPU >> > who read&clear the error MSR bank will be this MCE# owner. >Necessary >> > locks/synchronization will help to judge the owner and >select most severe >> > error. 3) For convenience, we will select the most >offending CPU to do >> > most of processing&recovery job. 4) MCE# happens, we will >do three jobs: >> > a. Send vIRQ to DOM0 for logging >> > b. Send vMCE# to Impacted Guest (Currently Only inject >to impacted >> > DOM0) c. Guest vMCE MSR virtualization >> > 5) Some further improvement/adds might be done if needed: >> > a) Impacted DOM judgement algorithm. >> > b) Now vMCE# injection is controlled by centralized >data(vmce_data). >> > The injection algorithm is a bit complex. We might change >the algorithm >> > which's based on PER_DOM data if you preferred. Notes for >understanding: >> > 1) If several banks impact one domain, yet those >banks belong to >> > the same pCPU, it will be injected only once. 2) If more >than one bank >> > impact one domain, yet error banks belong to different >pCPU, ith will be >> > injected nr_num(pCPU) times. 3) We use centralized data [two arrays >> > impact_domid, impact_cpus map in vmce_data] to represent >the injection >> > algorithm. Combined the two array item (idx, impact_domid) >and (idx, >> > impact_cpus) into one item (idx, impact_domid, >impact_cpus). This item >> > records the impact_domain id and the error pCPU map >(Finding UC errors on >> > this CPU which impact this domain). Then, we can judge how >to inject the >> > vMCE (domid, impact_times[nr_pCPUs]). >> > 4) Although data structure is ready, we only >inject vMCE# to >> > DOMD0 currently. c) Connection with recovery actions (cpu/memory >> > online/offline) d) More refines and tests for HVM might be >done when >> > needed. >> > >> > Patch Description: >> > 1. basic_mca_support: Enable MCA support in XEN. >> > 2. vmsr_virtualization: Guest MCE# MSR read/write >virtualization support >> > in XEN. 3. mce_dom0: Cooperating with XEN, DOM0 add vIRQ and vMCE# >> > handler. Translate XEN log to DOM0, re-use Linux kernel and MCELOG >> > mechanisms and MCE handler. This is mainly a demonstration patch. >> > >> > About Test: >> > We did some internal test and the result is just fine. >> > >> > Any feedback is welcome and thanks a lot for your help! :-) >> > Regards, >> > Criping > > > >-- >---to satisfy European Law for business letters: >Advanced Micro Devices GmbH >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >Registergericht Muenchen, HRB Nr. 43632 > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-16 15:19 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: 2009年2月16日 23:03 >To: Christoph Egger; xen-devel@lists.xensource.com >Cc: Ke, Liping; Frank.Vanderlinden@Sun.COM; Jiang, Yunhong; >Gavin Maltby >Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN > >On 16/02/2009 14:18, "Christoph Egger" <Christoph.Egger@amd.com> wrote: > >> IMO, any design change should be discussed first and not changed >> silently, since this will confuse everyone and noone will know >> what is the right thing to do in Xen and in Dom0 and this >> in turn will lead to error prone, unmaintainable code in both >> Xen and in Dom0 > >I certainly think we should have a shared approach for x86 >machine-check >handling, rather than completely different architectures for >AMD and Intel. >Fortunately Sun are an interested and active third party regarding this >feature. I'll be interested in their opinion.Yes, we don;t want difference here, we change only mce-intel.c because this is just for discuss. And I remember SUZUKI Kazuhiro are also interesting on this topic (is CC'ed now). Thanks -- Yunhong Jiang> > -- Keir > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Van Der Linden
2009-Feb-16 17:58 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Keir Fraser wrote:> On 16/02/2009 14:18, "Christoph Egger" <Christoph.Egger@amd.com> wrote: > > >> IMO, any design change should be discussed first and not changed >> silently, since this will confuse everyone and noone will know >> what is the right thing to do in Xen and in Dom0 and this >> in turn will lead to error prone, unmaintainable code in both >> Xen and in Dom0 >> > > I certainly think we should have a shared approach for x86 machine-check > handling, rather than completely different architectures for AMD and Intel. > Fortunately Sun are an interested and active third party regarding this > feature. I''ll be interested in their opinion. > > -- Keir > > >Today is a holiday here in the US, so I have only taken a superficial look at the patches. However, my initial impression is that I share Christoph''s concern. I like the original design, where the hypervisor deals with low-level information collection, passes it on to dom0, which then can make a high-level decision and instructs the hypervisor to take high-level action via a hypercall. The hypervisor does the actual MSR reads and writes, dom0 only acts on the values provided via hypercalls. We added the physcpuinfo hypercall to stay in this framework: get physical information needed for analysis, but don''t access any registers directly. It seems that these new patches blur this distinction, especially the virtualized msr reads/writes. I am not sure what added value they have, except for being able to run an unmodified MCA handler. However, I think that any active MCA decision making should be centralized, and that centralized place would be dom0. Dom0 is already very much aware of the hypervisor, so I don''t see the advantage of having an unmodified MCA handler there (our MCA handlers are virtually unmodified, it''s just that the part where the telemetry is collected is inside Xen for the dom0 case). I also agree that different behavior for AMD and Intel chips would not be good. Perhaps the Intel folks can explain what the advantages of their approach are, and give some scenarios where there approach would be better? My first impression is that staying within the general framework as provided by Christoph''s original work is the better option. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank Van Der Linden
2009-Feb-17 05:50 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
I should probably clarify myself, since I may have created one wrong impression: I don''t object to the parts of the Intel code where the hypervisor does more of the initial work (like is also done in the page offline code); it can be critical that this work is done quickly, and the hypervisor is the only place that has both the information and the means to do it. So, doing some more work there in some cases is probably the best thing to do, even though there is natural resistance to adding more code to the hypervisor. The main thing that I don''t quite understand the benefits of is the vMCE code, which is why I asked if there are examples of where that approach would work better. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-17 06:41 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
I think the major difference including: a) How to handle the #MC, i.e. reset system, decide impacted components, take recover action like page offline etc. b) How to handle error impact guest. As to other item like log/telemetry, I think our implementation didn''t have much different to current implementation. For how the handle the #MC, we think keep #MC handling in the hypervisor handler will have following benifit: a) When there is #MC happen, we need take action to reduce the severity of the error as soon as possible. After all, #MC is something different to normal interrupt. b) Even if Dom0 will take central action, most of the work will be to invoke hypercall to Xen HV to take action still. c) Currently all #MC will first go-through Dom0 before inject to DomU, but we didn''t think much benifit for such path, since HV knows about guest quite well. Above is the main reason that we keep #MC handling in Xen HV. As how to handle error impact guest, I tried to describe 3 options in http://lists.xensource.com/archives/html/xen-devel/2008-12/msg00643.html, basically we have 3 options (you can refer to above URL for more information): 1) A PV #MC handler is implemented in guest. This PV handler gets MCA information from Xen HV through hypercall, it is what currently implemented.; 2) Xen will provide MCA MSR virtualization so that guest''s native #MC handler can run without changes; 3)uses a PV #MC handler for guest as option 1, but interface between Xen/guest is abstract event, like offline offending page, terminate current execution context etc. We select option 2 in our current implementation, with following consideration: 1) With this method, we can re-use native MCE handler , which may be tested more widely 2) We can benifit from native MCE handler''s improvement 3) it can support HVM guest better, especially this method can provide support to HVM/PV guest at the same time. 4) We don''t need maintain PV handler anymore, for various guest type. One dis-advantage for this option is, guest (dom0) missed the physical CPU information. We think it will be much better if we can define a clear abstract interface between Xen/guest, i.e. option 3, but even in that situation, current implementation can be the last resorted method if guest has no PV abstract event handler installed. Especially we apply this method to Dom0 , because after we place all #MC handling in Xen HV, dom0''s MCE handler is same to normal guest, and we don''t need to diffrenciate it anymore, you can see the changes to dom0 for MCA is very small now. BTW, one assumption here is, dom0''s log/telemetry will all go-through the VIRQ handler while Dom0''s #MC is just for it''s recovery. Of course, currently keep system running is far more important than guest #MC, and we can simply kill impacted guest. We implement the virtual MSR read/write mainly for Dom0 support (or maybe even dom0 can be killed now since it can''t do much recovery still ). Thanks Yunhong Jiang>> >Today is a holiday here in the US, so I have only taken a superficial >look at the patches. > >However, my initial impression is that I share Christoph''s concern. I >like the original design, where the hypervisor deals with low-level >information collection, passes it on to dom0, which then can make a >high-level decision and instructs the hypervisor to take high-level >action via a hypercall. The hypervisor does the actual MSR reads and >writes, dom0 only acts on the values provided via hypercalls. > >We added the physcpuinfo hypercall to stay in this framework: get >physical information needed for analysis, but don''t access any >registers >directly. > >It seems that these new patches blur this distinction, especially the >virtualized msr reads/writes. I am not sure what added value >they have, >except for being able to run an unmodified MCA handler. >However, I think >that any active MCA decision making should be centralized, and that >centralized place would be dom0. Dom0 is already very much >aware of the >hypervisor, so I don''t see the advantage of having an unmodified MCA >handler there (our MCA handlers are virtually unmodified, it''s >just that >the part where the telemetry is collected is inside Xen for >the dom0 case). > >I also agree that different behavior for AMD and Intel chips would not >be good. > >Perhaps the Intel folks can explain what the advantages of their >approach are, and give some scenarios where there approach would be >better? My first impression is that staying within the general >framework >as provided by Christoph''s original work is the better option. > >- Frank > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-17 06:44 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>-----Original Message----- >From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM] >Sent: 2009年2月17日 13:50 >To: Keir Fraser >Cc: Gavin Maltby; Christoph Egger; >xen-devel@lists.xensource.com; Jiang, Yunhong; Ke, Liping >Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN > >I should probably clarify myself, since I may have created one wrong >impression: I don't object to the parts of the Intel code where the >hypervisor does more of the initial work (like is also done in >the page >offline code); it can be critical that this work is done quickly, and >the hypervisor is the only place that has both the information and the >means to do it.Yes, agree.> >So, doing some more work there in some cases is probably the >best thing >to do, even though there is natural resistance to adding more code to >the hypervisor.We all agree to keep HV less code, and we will try to reduce the LOC in next round patch.> >The main thing that I don't quite understand the benefits of >is the vMCE >code, which is why I asked if there are examples of where that >approach >would work better.Please see my mail I just sent out, you can also refer to http://lists.xensource.com/archives/html/xen-devel/2008-12/msg00643.html. Thanks Yunhong Jiang> >- Frank > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-17 06:53 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>>So, doing some more work there in some cases is probably the >>best thing >>to do, even though there is natural resistance to adding more code to >>the hypervisor. > >We all agree to keep HV less code, and we will try to reduce >the LOC in next round patch.BTW, some changes in Xen HV is needed no matter we place #MC to Xen or dom0, like ownership CPU check, select most severity CPU, post handler in softIRQ etc, which is complex also. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Feb-18 18:05 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Tuesday 17 February 2009 07:41:29 Jiang, Yunhong wrote:> I think the major difference including: a) How to handle the #MC, i.e. > reset system, decide impacted components, take recover action like page > offline etc. b) How to handle error impact guest. As to other item like > log/telemetry, I think our implementation didn''t have much different to > current implementation.The hardware doesn''t know what recover actions the software can do. If page A is faulty, and software maintains a copy in page B, then software can turn an uncorrectable error into an correctable one. If the hardware is aware of that copy (memory mirroring done by memory controller), then the hardware itself turns the uncorrectable error into an correctable one and reports an correctable error. Therefore, I don''t see why other flags than correctable and uncorrectable are needed at all. After some thinking on taking some quick actions, I can agree on it if it meets the condition below. Be aware, error analyzes is highly CPU vendor and even CPU family/model specific. Doing a complete analyzes as Solaris does blows Xen up a *lot*. Therefore, a *cheap* error analysis must be enough to figure out if recover actions like page-offlining or cpu offlining are *obviously* only the right thing to do. If this is not the case, then let Dom0 decide what to do. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-19 09:13 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
xen-devel-bounces@lists.xensource.com <> wrote:> On Tuesday 17 February 2009 07:41:29 Jiang, Yunhong wrote: >> I think the major difference including: a) How to handle the #MC, i.e. >> reset system, decide impacted components, take recover action like page >> offline etc. b) How to handle error impact guest. As to other item like >> log/telemetry, I think our implementation didn''t have much different to >> current implementation. > > The hardware doesn''t know what recover actions the software can do. > If page A is faulty, and software maintains a copy in page B, then > software can turn an uncorrectable error into an correctable one. > If the hardware is aware of that copy (memory mirroring done by memory > controller), then the hardware itself turns the uncorrectable error > into an correctable one and reports an correctable error. > > Therefore, I don''t see why other flags than correctable and uncorrectable > are needed at all.Christoph, thanks for your reply. I think recoverable means VMM/OS can take recover action like page offline, while unrecoverable means VMM/OS can''t do anything and we have to reboot. The main reason we need these flag is, several step is required for MCA handling, for example, when multipel MCE happen to multiple CPU, firstly each CPU check it''s own severity, seconldy we need check the most severity CPU and take action. For example, CPU A may get unrecoverable while CPU B get recoverable, they will check the information and the result, and the final solution will be unrecoverable .> > > After some thinking on taking some quick actions, I can > agree on it if it meets the condition below. Be aware, error analyzes > is highly CPU vendor and even CPU family/model specific. Doing a > complete analyzes as Solaris does blows Xen up a *lot*.I didn''t check Solaris code, so can Gavin or Frank gives us more information? At least currently it will not be large AFAIK, and if we do need model specific support (I don''t know such requirement now, and I suppose it will not be common if exists, please correct me if wrong), dom0 can inform Xen for it.> > Therefore, a *cheap* error analysis must be enough to figure out > if recover actions like page-offlining or cpu offlining > are *obviously* only the right thing to do.Currently we only plan to support these two types, do you have plan for other recover action? And is that action be done better in Dom0 than in Xen? Thanks -- Yunhong Jiang> > If this is not the case, then let Dom0 decide what to do.> > Christoph > > > -- > ---to satisfy European Law for business letters: > Advanced Micro Devices GmbH > Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni > Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > Registergericht Muenchen, HRB Nr. 43632 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Feb-19 16:25 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Thursday 19 February 2009 10:13:18 Jiang, Yunhong wrote:> xen-devel-bounces@lists.xensource.com <> wrote: > > On Tuesday 17 February 2009 07:41:29 Jiang, Yunhong wrote: > >> I think the major difference including: a) How to handle the #MC, i.e. > >> reset system, decide impacted components, take recover action like page > >> offline etc. b) How to handle error impact guest. As to other item like > >> log/telemetry, I think our implementation didn''t have much different to > >> current implementation. > > > > The hardware doesn''t know what recover actions the software can do. > > If page A is faulty, and software maintains a copy in page B, then > > software can turn an uncorrectable error into an correctable one. > > If the hardware is aware of that copy (memory mirroring done by memory > > controller), then the hardware itself turns the uncorrectable error > > into an correctable one and reports an correctable error. > > > > Therefore, I don''t see why other flags than correctable and uncorrectable > > are needed at all. > > Christoph, thanks for your reply. > > I think recoverable means VMM/OS can take recover action like page offline, > while unrecoverable means VMM/OS can''t do anything and we have to reboot.Ok, here is a different interpretation of what is correctable and uncorrectable. Uncorrectable in your interpretation means neither hardware nor software can''t do anything. Uncorrectable in my interpretation means the hardware can''t correct it, but software may have more information and correct it.> The main reason we need these flag is, several step is required for MCA > handling, for example, when multiple MCE happen to multiple CPU, firstly > each CPU check it''s own severity, seconldy we need check the most severity > CPU and take action. For example, CPU A may get unrecoverable while CPU B > get recoverable, they will check the information and the result, and the > final solution will be unrecoverable .I brought up an example of a broken memory page for my argumentation, you bring up a broken CPU for your argumentation. We need to find a common denominator to compare. If a CPU is completely broken and you are on UP, then game is over. Not even a reboot can help. On a SMP system, offline the CPU and inform Dom0.> > After some thinking on taking some quick actions, I can > > agree on it if it meets the condition below. Be aware, error analyzes > > is highly CPU vendor and even CPU family/model specific. Doing a > > complete analyzes as Solaris does blows Xen up a *lot*. > > I didn''t check Solaris code, so can Gavin or Frank gives us more > information? At least currently it will not be large AFAIK, and if we do > need model specific support (I don''t know such requirement now, and I > suppose it will not be common if exists, please correct me if wrong), dom0 > can inform Xen for it. > > > Therefore, a *cheap* error analysis must be enough to figure out > > if recover actions like page-offlining or cpu offlining > > are *obviously* only the right thing to do. > > Currently we only plan to support these two types, do you have plan for > other recover action? And is that action be done better in Dom0 than in > Xen?Yes!! Solaris maintains a list of broken pages which is even persistent across reboot when the serial number of the DIMM didn''t change. For doing page offlining properly, SUN should design a hypercall allowing the Dom0 to give Xen this list as early as possible at boot time. Further, with our Shanghai CPU, we can disable certain parts of its L3 cache. Instead of offlining that broken CPU completely, just disable the broken part of it. The registers for this is in PCI config space. Since Xen delegates PCI access to Dom0, Dom0 can do that. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-20 02:53 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph Egger <mailto:Christoph.Egger@amd.com> wrote:> Ok, here is a different interpretation of what is correctable and > uncorrectable. Uncorrectable in your interpretation means neither hardware > nor software can''t > do anything. > Uncorrectable in my interpretation means the hardware can''t > correct it, but > software may have more information and correct it.Yes. Maybe "fatal" is more appropriate name here.> >> The main reason we need these flag is, several step is required for MCA >> handling, for example, when multiple MCE happen to multiple CPU, firstly >> each CPU check it''s own severity, seconldy we need check the most severity >> CPU and take action. For example, CPU A may get unrecoverable while CPU B >> get recoverable, they will check the information and the result, and the >> final solution will be unrecoverable . > > I brought up an example of a broken memory page for my argumentation, > you bring up a broken CPU for your argumentation. > > We need to find a common denominator to compare. > > If a CPU is completely broken and you are on UP, then game is over. Not > even a reboot can help. On a SMP system, offline the CPU and inform Dom0.Sorry I didn''t get relationship between the flags and comparing the two example :$>> Currently we only plan to support these two types, do you have plan for >> other recover action? And is that action be done better in Dom0 than in >> Xen? > > Yes!! Solaris maintains a list of broken pages which is even persistent > across reboot when the serial number of the DIMM didn''t change. > For doing page offlining properly, SUN should design a > hypercall allowing > the Dom0 to give Xen this list as early as possible at boot time.We have a patch to support page offline (sent as RFC to mailing list), and it already export a hypercall for Dom0 to ask Xen to offline pages (this is for proactive action to CE errors from Dom0), also, as Frank suggested, we will add a hypercall for Dom0 to get page''s offline status, so it should be OK.> Further, with our Shanghai CPU, we can disable certain parts > of its L3 cache. > Instead of offlining that broken CPU completely, just disable > the broken > part of it. The registers for this is in PCI config space. > Since Xen delegates > PCI access to Dom0, Dom0 can do that.Sorry that I have no idea of Shanghai, but I''m a bit suprised that when error happens to cache, we will transfer control to Dom0 and wait for Dom0''s MCA handler to take action to disable the cache, it is really a loooong code path. Per my understanding, if there are issue in cache, we should clear/disable the cache ASAP to avoid more server result, and it is a extreme example to let Xen handle the MCA. Or maybe I missed something important in this feature? BTW, I want to clarify that this patch is for #MC handling (i.e. the "uncorrected" error in your mind). For hardware correctable error (i.e. "correctable") , Xen will do nothing, but just pass it to Dom0 as vIRQ as our previous patch (http://lists.xensource.com/archives/html/xen-devel/2008-12/msg00970.html ) shown, because CE will not impact system. So if the "cache index disable" is to disable part of cache after too many CE (Correctable Error) as proactive action, I think we are on the same page. I attached two foil that are part of our Xen summit presentation. Page 1 is mainly for #MC handling, page2 is for CE handling (though CMCI or polling). The page 1 is described in the patch clearly. Page 2 is what our previous patch did . Thanks -- Yunhong Jiang> > Christoph > > -- > ---to satisfy European Law for business letters: > Advanced Micro Devices GmbH > Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni > Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > Registergericht Muenchen, HRB Nr. 43632_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Feb-20 21:01 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
I had some time to look over the patches in more detail and the previous discussions that were referenced. From your patches, what you write, and your slides, I gather the following: * Corrected errors (found through polling and CMCI): 1) Collected error data (telemetry) 2) Inform dom0 through the VIRQ. * Uncorrected errors: 1) See if any immediate action can be taken (CPU offline, page retire) 2) Collect telemetry 3) Deliver vMCE to dom0 (and possibly domU) I think it''s fine that the hypervisor takes some immediate action in some cases. It is good to do this as quickly as possible, and only the hypervisor has all the information immediately available. What would be needed for the Solaris framework, however, is to provide information on what action was taken, along with the telemetry. As Christoph noted, the Solaris FMA code checks, at bootup, if there were components that previously had errors, and if so, it disables them again to prevent further errors. To be able to do this, it needs the full information not just on the error data, but also on any action taken by the hypervisor, so that it can repeat this action. It may take some modifications in the FMA code to account for the case where an action has already been taken (to avoid trying to take conflicting action), but I think that shouldn''t be a big problem. Although I don''t know that part of our code very well. The part that I still have doubts about, is the vMCE code. As far as I can tell, it takes the information out of the MCA banks, and stores it, per event, in a linked list. Per vMCE, the head of the list is taken and used as an MSR context. The rdmsr instruction is trapped and redirected to that information. It seems that the wrmsr instruction is accepted, but has no effect (except that if the trap handler writes a value and then reads it back again immediately, the values will be the same). The main argument for the vMCE code seems to be that it allows existing MCA handlers to be reused. However, I don''t see the advantage in this. Basically, it allows the handler to retrieve the MCA banks through plain rdmsr instructions. Which is fine, but that''s as far as it goes. Without any additional information, that feature does not seem useful. wrmsr instructions has no effect. To take further action, the MCA code in dom0 (or a domU) needs to know that it is running under Xen, and it needs to have detailed physical information on the system. In other words, the existing code that can be used is only the code that gathers some information. So, the only thing that vMCE is good for, is that you can run unmodified error logging code. But you can''t interpret any of the error information further without knowing more. Especially for a domU, which might not know anything, this doesn''t seem useful. What would the user of a domU do with that information? To recap, I think the part where Xen itself takes action is fine, with some modifications. But I don''t see any advantages in vMCE delivery, unless I''m missing something of course.. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-23 09:01 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote:> I had some time to look over the patches in more detail and > the previous > discussions that were referenced. > > From your patches, what you write, and your slides, I gather > the following: > > * Corrected errors (found through polling and CMCI): > 1) Collected error data (telemetry) > 2) Inform dom0 through the VIRQ. > > * Uncorrected errors: > 1) See if any immediate action can be taken (CPU offline, page > retire) 2) Collect telemetry > 3) Deliver vMCE to dom0 (and possibly domU)One notice is, we delieve vMCE to dom0/domU only when it is impacted. The idea behind this is, MCE is handled by Xen HV totally, while guest''s vMCE handler will only works for itself. For example, when a page broken, Xen will firstly mark the page offline in Xen side (i.e. take the recover action), then, it will inject a vMCE to guest corresponding (dom0 or domU), the guest will kill the application using the page, free the page, or do more action. And we always pass the vIRQ to dom0 for logging and telemetry, user space tools can take more proactive action for this if needed.> > > I think it''s fine that the hypervisor takes some immediate action in > some cases. It is good to do this as quickly as possible, and only the > hypervisor has all the information immediately available. > > What would be needed for the Solaris framework, however, is to provide > information on what action was taken, along with the telemetry. AsAgree that this modification is needed. Sorry we didn''t reliaze the requirement from Dom0 after reboot. Either we can pass the action in the telemetry, or Dom0 can take action specific method ,like retrieve the offlined page from Xen before reboot. If we take the former, we may need a interface definition.> Christoph noted, the Solaris FMA code checks, at bootup, if there were > components that previously had errors, and if so, it disables > them again > to prevent further errors. To be able to do this, it needs the full > information not just on the error data, but also on any action > taken by > the hypervisor, so that it can repeat this action. It may take some > modifications in the FMA code to account for the case where an action > has already been taken (to avoid trying to take conflicting > action), but > I think that shouldn''t be a big problem. Although I don''t know > that part > of our code very well. > > The part that I still have doubts about, is the vMCE code. As far as I > can tell, it takes the information out of the MCA banks, and > stores it, > per event, in a linked list. Per vMCE, the head of the list is > taken and > used as an MSR context. The rdmsr instruction is trapped and redirected > to that information. It seems that the wrmsr instruction is accepted, > but has no effect (except that if the trap handler writes a value and > then reads it back again immediately, the values will be the same). > The main argument for the vMCE code seems to be that it allows existing > MCA handlers to be reused. However, I don''t see the advantage in this. > Basically, it allows the handler to retrieve the MCA banks > through plain > rdmsr instructions. Which is fine, but that''s as far as it > goes. Without > any additional information, that feature does not seem useful. wrmsr > instructions has no effect.What do you mean of the effect of wrmsr instruction. We need considering inject #GP if invalid wrmsr , or remove the event when guest clear the MCi_STATUS_MCA if needed. We send this RFC early to get feedback firstly for the design idea. Or you mean more than this for the wrmsr?> > To take further action, the MCA code in dom0 (or a domU) needs to know > that it is running under Xen, and it needs to have detailed physicalOur purpose is guest has no idea it is running under xen as descripted above. And what information do you think a normal guest''s MCA handler needs to know, and use the detailed physical information? After all, a guest cares only itself. Also, maybe we can''t provide PV handler for all guest (like windows). Dom0 is a special case, it''s vIRQ handler knows it is running under Xen, but that is for log/telemetry and for proactive action.> information on the system. In other words, the existing code > that can beWhat do you mean of "existing", our patch or current Xen implementation?> used is only the code that gathers some information. So, the > only thing > that vMCE is good for, is that you can run unmodified error logging > code. But you can''t interpret any of the error information further > without knowing more. Especially for a domU, which might not know > anything, this doesn''t seem useful. What would the user of a domU do with > that information? > To recap, I think the part where Xen itself takes action is fine, with > some modifications. But I don''t see any advantages in vMCE delivery, > unless I''m missing something of course..I think the main advantage are: a) We don''t need maintain a PV MCA handler for guest, especially for HVM guest b) We can get benifit from guest''s MCA improvement/enhancement . c) Applying this to dom0, we don''t need different mechanism to dom0/hvm. Thanks Yunhong Jiang> > - Frank_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Feb-24 18:53 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Thanks for your reply. Let me explain my comments a little: Jiang, Yunhong wrote:> > One notice is, we delieve vMCE to dom0/domU only when it is impacted. The idea behind this is, MCE is handled by Xen HV totally, while guest''s vMCE handler will only works for itself. For example, when a page broken, Xen will firstly mark the page offline in Xen side (i.e. take the recover action), then, it will inject a vMCE to guest corresponding (dom0 or domU), the guest will kill the application using the page, free the page, or do more action. > > And we always pass the vIRQ to dom0 for logging and telemetry, user space tools can take more proactive action for this if needed.I understand this part, and have no problems with them mechanism itself. I think it has advantages over the original concept, where dom0 informs domUs. My question is: what useful action can a domU take without fully knowing the physical system? I''ll go more in to that below.>> What would be needed for the Solaris framework, however, is to provide >> information on what action was taken, along with the telemetry. As > > Agree that this modification is needed. Sorry we didn''t reliaze the requirement from Dom0 after reboot. > > Either we can pass the action in the telemetry, or Dom0 can take action specific method ,like retrieve the offlined page from Xen before reboot. If we take the former, we may need a interface definition.Passing the action along with the telemetry seems the best way to go to me. Since the telemetry is used to determine which action to take, any information on actions already taken should come at the same time. \> > What do you mean of the effect of wrmsr instruction. We need considering inject #GP if invalid wrmsr , or remove the event when guest clear the MCi_STATUS_MCA if needed. We send this RFC early to get feedback firstly for the design idea. > Or you mean more than this for the wrmsr? > >> To take further action, the MCA code in dom0 (or a domU) needs to know >> that it is running under Xen, and it needs to have detailed physical > > Our purpose is guest has no idea it is running under xen as descripted above. And what information do you think a normal guest''s MCA handler needs to know, and use the detailed physical information? After all, a guest cares only itself. Also, maybe we can''t provide PV handler for all guest (like windows). > > Dom0 is a special case, it''s vIRQ handler knows it is running under Xen, but that is for log/telemetry and for proactive action. > >> information on the system. In other words, the existing code >> that can be > > What do you mean of "existing", our patch or current Xen implementation? > >> used is only the code that gathers some information. So, the >> only thing >> that vMCE is good for, is that you can run unmodified error logging >> code. But you can''t interpret any of the error information further >> without knowing more. Especially for a domU, which might not know >> anything, this doesn''t seem useful. What would the user of a domU do with >> that information? >> To recap, I think the part where Xen itself takes action is fine, with >> some modifications. But I don''t see any advantages in vMCE delivery, >> unless I''m missing something of course.. > > I think the main advantage are: > a) We don''t need maintain a PV MCA handler for guest, especially for HVM guest > b) We can get benifit from guest''s MCA improvement/enhancement . > c) Applying this to dom0, we don''t need different mechanism to dom0/hvm.Ok, my main issue here is: if you want to enable a guest to run unmodified MCA code (which you state as a goal, and as an advantage of the vMCE approach), then what can the guest actually do. Or the dom0, for that matter? MCA information is highly specific to the hardware. Without additional information on the hardware, it is hard, or even impossible, for the unmodified MCA handler in dom0 or a domU to do anything useful. It will interpret the information to fit the virtualized environment it is in, which doesn''t match the reality of the hardware at all. So what can it do? It can just read the MSRs and log the information, but even that information wouldn''t be useful; it is already available to dom0, where the code and/or person who can make sense of the data will see it. The unmodified MCA handler also can''t take any corrective action; it might think that it is taking action, but in fact, its wrmsr instructions have no effect (and they shouldn''t, guests should definitely not be able to do MSR writes). I only see one possible exception to this: if you translate the ADDR MSR of a bank to a guest address in the vmca info before delivering the vMCE, then the guest could do something useful, because its virtualized MSR reads would then produce a guest address, and it could do something useful with it. But currently, your code doesn''t seem to do this; the virtualized MSR will produce the machine address, which the guest can''t do anything with, unless it knows its running under Xen. So that''s my main problem here: there is a contradiction. The vMCE mechanism as you implement it enables guests to run an unmodified MCA handler, but there isn''t actually much that the guest can do with that, without knowing it runs under Xen. I see only one specific use for this: if you translate the ADDR info to a guest address, it could potentially try to do a "local" page retire. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Feb-24 19:07 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Kleen, Andi wrote:>> MCA information is highly specific to the hardware. > > Actually Intel has architectural machine checks and except for > some optional addon information explicitely marked it''s all architectural > (as in defined to stay the same going forward)True, I probably expressed myself poorly here. I meant to say: it''s a physical hardware error, and in an unmodified virtualized environment the information about the physical hardware isn''t there.> For DomU translation of the address is needed, that''s correct. > For Dom0 logging physical is good because the logging tools > might need that.Right. As far as I understand it, this patch proposes to deliver the actual physical information to dom0 via the existing vIRQ mechanism, while the vMCE mechanism delivers virtualized info to any guest (both dom0 and domU). - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Feb-24 20:47 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Kleen, Andi wrote:>> Kleen, Andi wrote: > > So it''s generally better to inject generic events, not just blindly forward. >Agreed. I can see advantages to the vMCE code, but it has to deliver something to the domU that makes it do something reasonable. That''s why I have some doubts about the patch that was sent, it doesn''t quite seem to achieve that (certainly not without translating the address). - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-25 02:25 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote:> Kleen, Andi wrote: >>> Kleen, Andi wrote: >> >> So it''s generally better to inject generic events, not just blindly >> forward. >> > > Agreed. I can see advantages to the vMCE code, but it has to deliver > something to the domU that makes it do something reasonable. > That''s why > I have some doubts about the patch that was sent, it doesn''t > quite seem > to achieve that (certainly not without translating the address). > > - FrankYes, we should have include the translation. We didn''t do that when sending out the patch because we thought the PV guest has idea of m2p translation. Later we realized the translation is needed for PV guest after more consideration, since the unmodified #MC handler will use guest address. Of course we always need the translation for HVM guest, which however is not in that patch''s scope . Sorry for any confusion caused. One thing need notice is, the information passed through vIRQ is physical information while dom0s'' MCA handler will get guest information, so user space tools should be aware of such constraints. So, Frank/Egger, can I assume followed are consensus currently? 1) MCE is handled by Xen HV totally, while guest''s vMCE handler will only works for itself. 2) Xen present a virtual #MC to guest through MSR access emulation.(Xen will do the translation if needed). 3) Guest''s unmodified MCE handler will handle the vMCE injected. 4) Dom0 will get all log/telemetry through hypercall. 5) The action taken by xen will be passed to dom0 through the telemetry mechanism. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-25 02:26 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
> Right. As far as I understand it, this patch proposes to deliver the > actual physical information to dom0 via the existing vIRQ mechanism, > while the vMCE mechanism delivers virtualized info to any guest (both dom0 > and domU).Yes, excactly.> > - Frank > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-25 02:31 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
> That''s needed anyways for example to support migration between different > types of CPUs. The DomU really cannot take a specific CPU type > for granted or rather has to assume some fallback CPU. Also > for virtualization > it''s a common case that guests run very old OS, so it''s better to give > them the oldest possible events too. > > So it''s generally better to inject generic events, not just > blindly forward.Andi, what''s the meaning of "generic event"? Do you mean the option 3, i.e. some abstract event like page offlie, killing current execution event? Or you mean translate physical MSR value to guest-aware MSR value? Thanks Yunhong Jiang> > Only for Dom0 which does logging the physical hardware needs > to be described > correctly. > > -Andi_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Feb-25 10:37 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Tuesday 24 February 2009 20:07:16 Frank van der Linden wrote:> Kleen, Andi wrote: > >> MCA information is highly specific to the hardware. > > > > Actually Intel has architectural machine checks and except for > > some optional addon information explicitely marked it''s all architectural > > (as in defined to stay the same going forward) > > True, I probably expressed myself poorly here. I meant to say: it''s a > physical hardware error, and in an unmodified virtualized environment > the information about the physical hardware isn''t there. > > > For DomU translation of the address is needed, that''s correct. > > For Dom0 logging physical is good because the logging tools > > might need that. > > Right. As far as I understand it, this patch proposes to deliver the > actual physical information to dom0 via the existing vIRQ mechanism, > while the vMCE mechanism delivers virtualized info to any guest (both > dom0 and domU).The translation is still problematic: What if an error occured which impacts multiple physical contigous pages ? Translated into guest-physical address space, they may be non-contigous. That''s why the original design does not support HVM guests unless they are aware about running in Xen via an PV machine check driver. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Feb-25 10:57 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Tuesday 24 February 2009 21:33:47 Kleen, Andi wrote:> >Kleen, Andi wrote: > >>> MCA information is highly specific to the hardware. > >> > >> Actually Intel has architectural machine checks and except for > >> some optional addon information explicitely marked it''s all > > > >architectural > > > >> (as in defined to stay the same going forward) > > > >True, I probably expressed myself poorly here. I meant to say: it''s a > >physical hardware error, and in an unmodified virtualized environment > >the information about the physical hardware isn''t there. > > In a DomU it''s not important that the physical hardware is correctly > described, the only thing that matters is that the event triggers > the DomU code to do the expected action.I agree with that. The DomU see''s a hw environment which may (partially) match the physical hardware. The physical machine check error must be translated in a way that fits into the guest''s hw environment. This is not just limited to the memory layout. An example to clarify the point (which actually won''t apply directly to Xen, but you should get the idea): The guest hw environment is an (emulated) sparc CPU, memory and PCI devices. The host''s hw environment is a x86 PC. Now an machine check error occurs. If you want to forward it into the guest, you must translate it in a way as the guest OS would expect it from a native sparc machine. -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Feb-25 12:19 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:> Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote: > > Kleen, Andi wrote: > >>> Kleen, Andi wrote: > >> > >> So it''s generally better to inject generic events, not just blindly > >> forward. > > > > Agreed. I can see advantages to the vMCE code, but it has to deliver > > something to the domU that makes it do something reasonable. > > That''s why > > I have some doubts about the patch that was sent, it doesn''t > > quite seem > > to achieve that (certainly not without translating the address). > > > > - Frank > > Yes, we should have include the translation. We didn''t do that when sending > out the patch because we thought the PV guest has idea of m2p translation. > Later we realized the translation is needed for PV guest after more > consideration, since the unmodified #MC handler will use guest address. Of > course we always need the translation for HVM guest, which however is not > in that patch''s scope . Sorry for any confusion caused. > > One thing need notice is, the information passed through vIRQ is physical > information while dom0s'' MCA handler will get guest information, so user > space tools should be aware of such constraints. > > So, Frank/Egger, can I assume followed are consensus currently? > > 1) MCE is handled by Xen HV totally, while guest''s vMCE handler will only > works for itself. > 2) Xen present a virtual #MC to guest through MSR access > emulation.(Xen will do the translation if needed). > 3) Guest''s unmodified > MCE handler will handle the vMCE injected. > 4) Dom0 will get all log/telemetry through hypercall. > 5) The action taken by xen will be passed to dom0 through the telemetry > mechanism.Mostly. Regarding 2) I want like to discuss first how to handle errors impacting multiple contiguous physical pages which are non-contigous in guest physical space. And I also want to discuss about how to do recovery actions requiring PCI access. One example for this is Shanghai''s "L3 Cache Index Disable"-Feature. Xen delegates PCI config space to Dom0 and via PCI passthrough partly to DomU. That means, if registers in PCI config space are independently accessable by Xen, Dom0 and/or DomU, they can interfere with each other. Therefore, we need to a) clearly define who handles what and b) define some rules based on a) c) discuss how to handle Dom0/DomU going wild and break the rules defined in b) Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Feb-25 17:32 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph Egger wrote:> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote: > >> So, Frank/Egger, can I assume followed are consensus currently? >> >> 1) MCE is handled by Xen HV totally, while guest''s vMCE handler will only >> works for itself. >> 2) Xen present a virtual #MC to guest through MSR access >> emulation.(Xen will do the translation if needed). >> 3) Guest''s unmodified >> MCE handler will handle the vMCE injected. >> 4) Dom0 will get all log/telemetry through hypercall. >> 5) The action taken by xen will be passed to dom0 through the telemetry >> mechanism. > > Mostly. Regarding 2) I want like to discuss first how to handle errors > impacting multiple contiguous physical pages which are non-contigous > in guest physical space. > > And I also want to discuss about how to do recovery actions requiring > PCI access. One example for this is > Shanghai''s "L3 Cache Index Disable"-Feature. > Xen delegates PCI config space to Dom0 and > via PCI passthrough partly to DomU. > That means, if registers in PCI config space are independently > accessable by Xen, Dom0 and/or DomU, they can interfere with each other. > Therefore, we need to > a) clearly define who handles what and > b) define some rules based on a) > c) discuss how to handle Dom0/DomU going wild > and break the rules defined in b)I also agree on the approach in principle, but would like to see these points addressed. For non-contiguous pages, I suppose Xen could deliver multiple #vMCEs to the guest, split into contiguous parts. The vmce code seems to be set up to be able to do this. As for the Shanghai feature: Christoph, are there any documents available on that feature? What kind of errors are delivered (corrected/correctable)? - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gavin Maltby
2009-Feb-25 22:30 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph Egger wrote:> Mostly. Regarding 2) I want like to discuss first how to handle errors > impacting multiple contiguous physical pages which are non-contigous > in guest physical space.I can''t think of any such error types. ECC checkwords don''t span page boundaries, so you only ever get an error at a time affecting one small part of one page. That physically adjacent pages have both had errors would come our in the wash, but they''d be processed and recognised individually. Gavin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Feb-26 02:16 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christopher/Egger, thanks for reply very much, see comments below.>-----Original Message----- >From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM] >Sent: 2009年2月26日 1:33 >To: Christoph Egger >Cc: Jiang, Yunhong; Kleen, Andi; >xen-devel@lists.xensource.com; Keir Fraser; Ke, Liping; Gavin Maltby >Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN > >Christoph Egger wrote: >> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote: >> >>> So, Frank/Egger, can I assume followed are consensus currently? >>> >>> 1) MCE is handled by Xen HV totally, while guest's vMCE >handler will only >>> works for itself. >>> 2) Xen present a virtual #MC to guest through MSR access >>> emulation.(Xen will do the translation if needed). >>> 3) Guest's unmodified >>> MCE handler will handle the vMCE injected. >>> 4) Dom0 will get all log/telemetry through hypercall. >>> 5) The action taken by xen will be passed to dom0 through >the telemetry >>> mechanism. >> >> Mostly. Regarding 2) I want like to discuss first how to >handle errors >> impacting multiple contiguous physical pages which are non-contigous >> in guest physical space.>> >> And I also want to discuss about how to do recovery actions requiring >> PCI access. One example for this is >> Shanghai's "L3 Cache Index Disable"-Feature. >> Xen delegates PCI config space to Dom0 and >> via PCI passthrough partly to DomU. >> That means, if registers in PCI config space are independently >> accessable by Xen, Dom0 and/or DomU, they can interfere with >each other. >> Therefore, we need to >> a) clearly define who handles what and >> b) define some rules based on a) >> c) discuss how to handle Dom0/DomU going wild >> and break the rules defined in b) > >I also agree on the approach in principle, but would like to see these >points addressed. For non-contiguous pages, I suppose Xen >could deliver >multiple #vMCEs to the guest, split into contiguous parts. The >vmce code >seems to be set up to be able to do this.For the contigous pages, I agree with Gavin that such contiguous page error should be triggered as multiple #MC and so is ok. For PCI config space issue, Christoph, can you please share more information on it (or provide some document as Frank suggested), like is it for CE (Correctable error or UC(UnCorrectable error), is it in PCI range or PCI-E range (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's BDF caculated etc. Followed is some of my understanding. Firstly, if it is CE, Xen will do nothing and dom0 will take recovery action. If it is UC, Xen will take action when all CPU is in SoftIRQ context, and dom0 will not take action, so it should be ok. Secondly, in Xen environment, per my understanding, CPU is owned by Xen HV, so I'm not sure when dom0 disable L3 cache (if it is CE), should Xen be aware or not. That is, should dom0 disable the cache directly, or it should user hypercall to ask Xen do that. Keir can give us more suggestion. For item C, currently Xen/dom0 can both access configuration space, while domU will do that through PCI_frontend/backend. Because PCI backend only cover device assigned to domU, so we don't need worry about domU and dom0 should be trusted. However, one thing left is, if this range is beyond 0x100 (i.e. in pci-e range), we need add mmconfig support in Xen, although it can be added simply. Thanks -- Yunhong Jiang> >As for the Shanghai feature: Christoph, are there any documents >available on that feature? What kind of errors are delivered >(corrected/correctable)? > >- Frank >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-02 05:51 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Frank/Christopher, can you please give more comments for it, or you are OK with this? For the action reporting mechanism, we will send out a proposal for review soon. Thanks Yunhong Jiang Jiang, Yunhong <> wrote:> Christopher/Frank, thanks for reply very much, see comments below. > >> -----Original Message----- >> From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM] Sent: >> 2009年2月26日 1:33 To: Christoph Egger >> Cc: Jiang, Yunhong; Kleen, Andi; >> xen-devel@lists.xensource.com; Keir Fraser; Ke, Liping; Gavin Maltby >> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN >> >> Christoph Egger wrote: >>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote: >>> >>>> So, Frank/Egger, can I assume followed are consensus currently? >>>> >>>> 1) MCE is handled by Xen HV totally, while guest's vMCE handler will >>>> only works for itself. 2) Xen present a virtual #MC to guest through MSR >>>> access emulation.(Xen will do the translation if needed). >>>> 3) Guest's unmodified >>>> MCE handler will handle the vMCE injected. >>>> 4) Dom0 will get all log/telemetry through hypercall. >>>> 5) The action taken by xen will be passed to dom0 through the telemetry >>>> mechanism. >>> >>> Mostly. Regarding 2) I want like to discuss first how to handle errors >>> impacting multiple contiguous physical pages which are non-contigous >>> in guest physical space. > > >>> >>> And I also want to discuss about how to do recovery actions requiring >>> PCI access. One example for this is >>> Shanghai's "L3 Cache Index Disable"-Feature. >>> Xen delegates PCI config space to Dom0 and >>> via PCI passthrough partly to DomU. >>> That means, if registers in PCI config space are independently >>> accessable by Xen, Dom0 and/or DomU, they can interfere with each other. >>> Therefore, we need to a) clearly define who handles what and >>> b) define some rules based on a) >>> c) discuss how to handle Dom0/DomU going wild >>> and break the rules defined in b) >> >> I also agree on the approach in principle, but would like to see these >> points addressed. For non-contiguous pages, I suppose Xen >> could deliver >> multiple #vMCEs to the guest, split into contiguous parts. The >> vmce code >> seems to be set up to be able to do this. > > For the contigous pages, I agree with Gavin that such > contiguous page error should be triggered as multiple #MC and so is ok. > > For PCI config space issue, Christoph, can you please share > more information on it (or provide some document as Frank > suggested), like is it for CE (Correctable error or > UC(UnCorrectable error), is it in PCI range or PCI-E range > (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's > BDF caculated etc. Followed is some of my understanding. > > Firstly, if it is CE, Xen will do nothing and dom0 will take > recovery action. If it is UC, Xen will take action when all > CPU is in SoftIRQ context, and dom0 will not take action, so > it should be ok. > > Secondly, in Xen environment, per my understanding, CPU is > owned by Xen HV, so I'm not sure when dom0 disable L3 cache > (if it is CE), should Xen be aware or not. That is, should > dom0 disable the cache directly, or it should user hypercall > to ask Xen do that. Keir can give us more suggestion. > > For item C, currently Xen/dom0 can both access configuration > space, while domU will do that through PCI_frontend/backend. > Because PCI backend only cover device assigned to domU, so we > don't need worry about domU and dom0 should be trusted. > However, one thing left is, if this range is beyond 0x100 > (i.e. in pci-e range), we need add mmconfig support in Xen, > although it can be added simply. > > Thanks > -- Yunhong Jiang > >> >> As for the Shanghai feature: Christoph, are there any documents >> available on that feature? What kind of errors are delivered >> (corrected/correctable)? >> >> - Frank_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Mar-02 14:51 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Monday 02 March 2009 06:51:22 Jiang, Yunhong wrote:> Frank/Christopher, can you please give more comments for it, or you are OKSorry, for the delay. I''m also busy with other tasks.> with this? For the action reporting mechanism, we will send out a proposal > for review soon.I would like to see interface definition first, which covers all aspects we discussed.> > Thanks > Yunhong Jiang > > Jiang, Yunhong <> wrote: > > Christopher/Frank, thanks for reply very much, see comments below. > > > >> -----Original Message----- > >> From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM] > >> Sent: 2009年2月26日 1:33 To: Christoph Egger > >> Cc: Jiang, Yunhong; Kleen, Andi; > >> xen-devel@lists.xensource.com; Keir Fraser; Ke, Liping; Gavin Maltby > >> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN > >> > >> Christoph Egger wrote: > >>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote: > >>>> So, Frank/Egger, can I assume followed are consensus currently? > >>>> > >>>> 1) MCE is handled by Xen HV totally, while guest''s vMCE handler will > >>>> only works for itself. 2) Xen present a virtual #MC to guest through > >>>> MSR access emulation.(Xen will do the translation if needed). > >>>> 3) Guest''s unmodified > >>>> MCE handler will handle the vMCE injected. > >>>> 4) Dom0 will get all log/telemetry through hypercall. > >>>> 5) The action taken by xen will be passed to dom0 through the > >>>> telemetry mechanism. > >>> > >>> Mostly. Regarding 2) I want like to discuss first how to handle errors > >>> impacting multiple contiguous physical pages which are non-contigous > >>> in guest physical space. > >>> > >>> > >>> > >>> And I also want to discuss about how to do recovery actions requiring > >>> PCI access. One example for this is > >>> Shanghai''s "L3 Cache Index Disable"-Feature. > >>> Xen delegates PCI config space to Dom0 and > >>> via PCI passthrough partly to DomU. > >>> That means, if registers in PCI config space are independently > >>> accessable by Xen, Dom0 and/or DomU, they can interfere with each > >>> other. Therefore, we need to a) clearly define who handles what and > >>> b) define some rules based on a) > >>> c) discuss how to handle Dom0/DomU going wild > >>> and break the rules defined in b) > >> > >> I also agree on the approach in principle, but would like to see these > >> points addressed. For non-contiguous pages, I suppose Xen > >> could deliver > >> multiple #vMCEs to the guest, split into contiguous parts. The > >> vmce code > >> seems to be set up to be able to do this. > > > > For the contigous pages, I agree with Gavin that such > > contiguous page error should be triggered as multiple #MC and so is ok. > > > > For PCI config space issue, Christoph, can you please share > > more information on it (or provide some document as Frank > > suggested), like is it for CE (Correctable error or > > UC(UnCorrectable error), is it in PCI range or PCI-E range > > (i.e. through 0xCF8/CFC or through MMCONFIG), how the device''s > > BDF caculated etc. Followed is some of my understanding. > > > > Firstly, if it is CE, Xen will do nothing and dom0 will take > > recovery action. If it is UC, Xen will take action when all > > CPU is in SoftIRQ context, and dom0 will not take action, so > > it should be ok. > > > > Secondly, in Xen environment, per my understanding, CPU is > > owned by Xen HV, so I''m not sure when dom0 disable L3 cache > > (if it is CE), should Xen be aware or not. That is, should > > dom0 disable the cache directly, or it should user hypercall > > to ask Xen do that. Keir can give us more suggestion. > > > > For item C, currently Xen/dom0 can both access configuration > > space, while domU will do that through PCI_frontend/backend. > > Because PCI backend only cover device assigned to domU, so we > > don''t need worry about domU and dom0 should be trusted. > > However, one thing left is, if this range is beyond 0x100 > > (i.e. in pci-e range), we need add mmconfig support in Xen, > > although it can be added simply. > > > > Thanks > > -- Yunhong Jiang > > > >> As for the Shanghai feature: Christoph, are there any documents > >> available on that feature?Yes, our BKDG.> >> What kind of errors are delivered (corrected/correctable)?The error type can be both depending on whether correction via ECC was successful or not. -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Mar-02 14:58 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Thursday 26 February 2009 03:16:29 Jiang, Yunhong wrote:> Christopher/Egger, thanks for reply very much, see comments below. > > >-----Original Message----- > >From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM] > >Sent: 2009年2月26日 1:33 > >To: Christoph Egger > >Cc: Jiang, Yunhong; Kleen, Andi; > >xen-devel@lists.xensource.com; Keir Fraser; Ke, Liping; Gavin Maltby > >Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN > > > >Christoph Egger wrote: > >> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote: > >>> So, Frank/Egger, can I assume followed are consensus currently? > >>> > >>> 1) MCE is handled by Xen HV totally, while guest''s vMCE > > > >handler will only > > > >>> works for itself. > >>> 2) Xen present a virtual #MC to guest through MSR access > >>> emulation.(Xen will do the translation if needed). > >>> 3) Guest''s unmodified > >>> MCE handler will handle the vMCE injected. > >>> 4) Dom0 will get all log/telemetry through hypercall. > >>> 5) The action taken by xen will be passed to dom0 through > > > >the telemetry > > > >>> mechanism. > >> > >> Mostly. Regarding 2) I want like to discuss first how to > > > >handle errors > > > >> impacting multiple contiguous physical pages which are non-contigous > >> in guest physical space. > >> > >> > >> > >> And I also want to discuss about how to do recovery actions requiring > >> PCI access. One example for this is > >> Shanghai''s "L3 Cache Index Disable"-Feature. > >> Xen delegates PCI config space to Dom0 and > >> via PCI passthrough partly to DomU. > >> That means, if registers in PCI config space are independently > >> accessable by Xen, Dom0 and/or DomU, they can interfere with > > > >each other. > > > >> Therefore, we need to > >> a) clearly define who handles what and > >> b) define some rules based on a) > >> c) discuss how to handle Dom0/DomU going wild > >> and break the rules defined in b) > > > >I also agree on the approach in principle, but would like to see these > >points addressed. For non-contiguous pages, I suppose Xen > >could deliver > >multiple #vMCEs to the guest, split into contiguous parts. The > >vmce code > >seems to be set up to be able to do this.For virtual MCEs that is ok. But note, for unmodified guests, the MC handler is written with the assumption that the CPU powers off when an #MCE happens before the handler cleared the MCIP bit in the MCG_STATUS MSR.> > For the contigous pages, I agree with Gavin that such contiguous page error > should be triggered as multiple #MC and so is ok. > > For PCI config space issue, Christoph, can you please share more > information on it (or provide some document as Frank suggested), like is it > for CE (Correctable error or UC(UnCorrectable error), is it in PCI range or > PCI-E range (i.e. through 0xCF8/CFC or through MMCONFIG), how the device''s > BDF caculated etc. Followed is some of my understanding.I would like to see a generic solution that works with any feature requiring access to the pci space rather a per-feature solution.> Firstly, if it is CE, Xen will do nothing and dom0 will take recovery > action. If it is UC, Xen will take action when all CPU is in SoftIRQ > context, and dom0 will not take action, so it should be ok. > > Secondly, in Xen environment, per my understanding, CPU is owned by Xen HV, > so I''m not sure when dom0 disable L3 cache (if it is CE), should Xen be > aware or not. That is, should dom0 disable the cache directly, or it should > user hypercall to ask Xen do that. Keir can give us more suggestion. > > For item C, currently Xen/dom0 can both access configuration space, while > domU will do that through PCI_frontend/backend. Because PCI backend only > cover device assigned to domU, so we don''t need worry about domU and dom0 > should be trusted. However, one thing left is, if this range is beyond > 0x100 (i.e. in pci-e range), we need add mmconfig support in Xen, although > it can be added simply. > > Thanks > -- Yunhong Jiang > > >As for the Shanghai feature: Christoph, are there any documents > >available on that feature? What kind of errors are delivered > >(corrected/correctable)? > > > >- Frank-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-02 16:09 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
xen-devel-bounces@lists.xensource.com <> wrote:> On Monday 02 March 2009 06:51:22 Jiang, Yunhong wrote: >> Frank/Christopher, can you please give more comments for it, or you are OK > > Sorry, for the delay. I''m also busy with other tasks. > >> with this? For the action reporting mechanism, we will send out a proposal >> for review soon. > > I would like to see interface definition first, which covers > all aspects > we discussed. >>>>> As for the Shanghai feature: Christoph, are there any documents >>>> available on that feature? > > Yes, our BKDG.I checked BKDG for both Family 10 and 11 (http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41256.pdf and http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.pdf, and didn''t find the related info, Can you share more info like the URL and the section number?> >>>> What kind of errors are delivered (corrected/correctable)? > > The error type can be both depending on whether correction > via ECC was successful or not.So you mean if ECC failed in the L3 cache, Xen must do "L3 Cache Index Disable" immediately to avoid the cache not be used anymore? Thanks Yunhong Jiang> > > -- > ---to satisfy European Law for business letters: > Advanced Micro Devices GmbH > Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni > Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > Registergericht Muenchen, HRB Nr. 43632 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-02 16:15 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
> > For virtual MCEs that is ok. But note, for unmodified guests, > the MC handler > is written with the assumption that the CPU powers off when an #MCE > happens before the handler cleared the MCIP bit in the MCG_STATUS MSR.That should depends on implementation, for example, we can inject the vMCE one by one, i.e. only inject next after the first is handled already.> >> >> For the contigous pages, I agree with Gavin that such contiguous page error >> should be triggered as multiple #MC and so is ok. >> >> For PCI config space issue, Christoph, can you please share more >> information on it (or provide some document as Frank suggested), like is it >> for CE (Correctable error or UC(UnCorrectable error), is it in PCI range or >> PCI-E range (i.e. through 0xCF8/CFC or through MMCONFIG), how the device''s >> BDF caculated etc. Followed is some of my understanding. > > I would like to see a generic solution that works with any feature > requiring access to the pci space rather a per-feature solution.I think the solution is , Xen care for MCE while dom0 care for CE error. Or another solution is all PCI access for CPU RAS is done by Xen since Xen owns CPU. ISome information like how the pci config space is arranged will be helpful, I think. Thanks Yunhong Jiang> > >> Firstly, if it is CE, Xen will do nothing and dom0 will take recovery >> action. If it is UC, Xen will take action when all CPU is in SoftIRQ >> context, and dom0 will not take action, so it should be ok. >> >> Secondly, in Xen environment, per my understanding, CPU is owned by Xen HV, >> so I''m not sure when dom0 disable L3 cache (if it is CE), should Xen be >> aware or not. That is, should dom0 disable the cache directly, or it should >> user hypercall to ask Xen do that. Keir can give us more suggestion. >> >> For item C, currently Xen/dom0 can both access configuration space, while >> domU will do that through PCI_frontend/backend. Because PCI backend only >> cover device assigned to domU, so we don''t need worry about domU and dom0 >> should be trusted. However, one thing left is, if this range is beyond >> 0x100 (i.e. in pci-e range), we need add mmconfig support in Xen, although >> it can be added simply. >> >> Thanks >> -- Yunhong Jiang >> >>> As for the Shanghai feature: Christoph, are there any documents >>> available on that feature? What kind of errors are delivered >>> (corrected/correctable)? >>> >>> - Frank > > > > -- > ---to satisfy European Law for business letters: > Advanced Micro Devices GmbH > Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni > Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > Registergericht Muenchen, HRB Nr. 43632_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Mar-02 17:47 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Jiang, Yunhong wrote:> Frank/Christopher, can you please give more comments for it, or you are OK with this? > For the action reporting mechanism, we will send out a proposal for review soon.I''m ok with this. We need a little more information on the AMD mechanism, but it seems to me that we can fit this in. Sometime this week, I''ll also send out the last of our changes that haven''t been sent upstream to xen-unstable yet. Maybe we can combine some things in to one patch, like the telemetry handling changes that Gavin did. The other changes are error injection (for debugging) and panic crash dump support for our FMA tools, but those are probably only interesting to us. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-05 04:45 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote:> Jiang, Yunhong wrote: >> Frank/Christopher, can you please give more comments for it, or you are OK >> with this? For the action reporting mechanism, we will send out a proposal >> for review soon. > > I''m ok with this. We need a little more information on the AMD > mechanism, but it seems to me that we can fit this in. > > Sometime this week, I''ll also send out the last of our changes that > haven''t been sent upstream to xen-unstable yet. Maybe we can combine > some things in to one patch, like the telemetry handling changes that > Gavin did. The other changes are error injection (for debugging) and > panic crash dump support for our FMA tools, but those are probably only > interesting to us. > > - FrankGlad to knows about the conclusion. See my reply to Christoph on the AMD mechanism, but still waiting for response. Thanks Yunhong Jiang _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-05 08:31 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph/Frank, Followed is the interface definition, please have a look. Thanks Yunhong Jiang 1) Interface between Xen/dom0 for passing xen''s recovery action information to dom0. Usage model: After offlining broken page, Xen might pass its page-offline recovery action result information to dom0. Dom0 will save the information in non-volatile memory for further proactive actions, such as offlining the easy-broken page early when doing next reboot. struct page_offline_action { /* Params for passing the offlined page number to DOM0 */ uint64_t mfn; uint64_t status; /* Similar to page offline hypercall */ }; struct cpu_offline_action { /* Params for passing the identity of the offlined CPU to DOM0 */ uint32_t mc_socketid; uint16_t mc_coreid; uint16_t mc_core_threadid; }; struct cache_shrink_action { /* TBD, Christoph, please fill it */ }; /* Recover action flags, giving recovery result information to guest */ /* Recovery successfully after taking certain recovery actions below */ #define REC_ACT_RECOVERED (0x1 << 0) /* For solaris''s usage that dom0 will take ownership when crash */ #define REC_ACT_RESET (0x1 << 2) /* No action is performed by XEN */ #define REC_ACT_INFO (0x1 << 3) /* Recover action type definition, valid only when flags & REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1 #define MC_ACT_CPU_OFFLINE 2 #define MC_ACT_CACHE_SHIRNK 3 struct recovery_action { uint8_t flags; uint8_t action_type; union { struct page_offline_action page_retire; struct cpu_offline_action cpu_offline; struct cache_shrink_action cache_shrink; uint8_t pad[MAX_ACTION_SIZE]; } action_info; } struct mcinfo_bank { struct mcinfo_common common; uint16_t mc_bank; /* bank nr */ uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0 * and if mc_addr is valid. Never valid on DomU. */ uint64_t mc_status; /* bank status */ uint64_t mc_addr; /* bank address, only valid * if addr bit is set in mc_status */ uint64_t mc_misc; uint64_t mc_ctrl2; uint64_t mc_tsc; /* Recovery action is performed per bank */ struct recovery_action action; }; 2) Below two interfaces are for MCA processing internal use. a. pre_handler will be called earlier in MCA ISR context, mainly for early need_reset detection for avoiding log missing (flag MCA_RESET). Also, pre_handler might be able to find the impacted domain if possible. b. mca_error_handler is actually a (error_action_index, recovery_handler pointer) pair. The defined recovery_handler function performs the actual recovery operations in softIrq context after the per_bank MCA error matching the corresponding mca_code index. If pre_handler can''t judge the impacted domain, recovery_handler must figure it out. /* Error has been recovered successfully */ #define MCA_RECOVERD 0 /* Error impact one guest as stated in owner field */ #define MCA_OWNER 1 /* Error can''t be recovered and need reboot system */ #define MCA_RESET 2 /* Error should be handled in softIRQ context */ #define MCA_MORE_ACTION 3 struct mca_handle_result { uint32_t flags; /* Valid only when flags & MCA_OWNER */ domid_d owner; /* valid only when flags & MCA_RECOVERD */ struct recovery_action *action; }; struct mca_error_handler { /* * Assume we will need only architecture defined code. If the index can''t be setup by * mca_code, we will add a function to do the (index, recovery_handler) mapping check. * This mca_code represents the recovery handler pointer index for identifying this * particular error''s corresponding recover action */ uint16_t mca_code; /* Handler to be called in softIRQ handler context */ int recovery_handler(struct mcinfo_bank *bank, struct mcinfo_global *global, struct mcinfo_extended *extention, struct mca_handle_result *result); }; struct mca_error_handler intel_mca_handler[] = { .... }; struct mca_error_handler amd_mca_handler[] { .... }; /* HandlVer to be called in MCA ISR in MCA context */ int intel_mca_pre_handler(struct cpu_user_regs *regs, struct mca_handle_result *result); int amd_mca_pre_handler(struct cpu_user_regs *regs, struct mca_handle_result *result); Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote:> Jiang, Yunhong wrote: >> Frank/Christopher, can you please give more comments for it, or you are OK >> with this? For the action reporting mechanism, we will send out a proposal >> for review soon. > > I''m ok with this. We need a little more information on the AMD > mechanism, but it seems to me that we can fit this in. > > Sometime this week, I''ll also send out the last of our changes that > haven''t been sent upstream to xen-unstable yet. Maybe we can combine > some things in to one patch, like the telemetry handling changes that > Gavin did. The other changes are error injection (for debugging) and > panic crash dump support for our FMA tools, but those are probably only > interesting to us. > > - Frank_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Mar-05 14:53 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
MC_ACT_CACHE_SHIRNK <-- typo. should be MC_ACT_CACHE_SHRINK The L3 cache index disable feature works like this: You read the bits 17:6 from the MSR 0xC0000408 (which is MC4_MISC1) and write it into the index field. This MSR does not belong to the standard mc bank data and is therefore provided by mcinfo_extended. The index field are the bits 11:0 of the PCI function 3 register "L3 Cache Index Disable". Why is the recover action bound to the bank ? I would like to see a struct mcinfo_recover rather extending struct mcinfo_bank. That gives us flexibility. Christoph On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:> Christoph/Frank, Followed is the interface definition, please have a look. > > Thanks > Yunhong Jiang > > 1) Interface between Xen/dom0 for passing xen''s recovery action information > to dom0. Usage model: After offlining broken page, Xen might pass its > page-offline recovery action result information to dom0. Dom0 will save the > information in non-volatile memory for further proactive actions, such as > offlining the easy-broken page early when doing next reboot. > > > struct page_offline_action > { > /* Params for passing the offlined page number to DOM0 */ > uint64_t mfn; > uint64_t status; /* Similar to page offline hypercall */ > }; > > struct cpu_offline_action > { > /* Params for passing the identity of the offlined CPU to DOM0 */ > uint32_t mc_socketid; > uint16_t mc_coreid; > uint16_t mc_core_threadid; > }; > > struct cache_shrink_action > { > /* TBD, Christoph, please fill it */ > }; > > /* Recover action flags, giving recovery result information to guest */ > /* Recovery successfully after taking certain recovery actions below */ > #define REC_ACT_RECOVERED (0x1 << 0) > /* For solaris''s usage that dom0 will take ownership when crash */ > #define REC_ACT_RESET (0x1 << 2) > /* No action is performed by XEN */ > #define REC_ACT_INFO (0x1 << 3) > > /* Recover action type definition, valid only when flags & > REC_ACT_RECOVERED */ > #define MC_ACT_PAGE_OFFLINE 1 > #define MC_ACT_CPU_OFFLINE 2 > #define MC_ACT_CACHE_SHIRNK 3 > > struct recovery_action > { > uint8_t flags; > uint8_t action_type; > union > { > struct page_offline_action page_retire; > struct cpu_offline_action cpu_offline; > struct cache_shrink_action cache_shrink; > uint8_t pad[MAX_ACTION_SIZE]; > } action_info; > } > > struct mcinfo_bank { > struct mcinfo_common common; > > uint16_t mc_bank; /* bank nr */ > uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0 > * and if mc_addr is valid. Never valid on DomU. */ > uint64_t mc_status; /* bank status */ > uint64_t mc_addr; /* bank address, only valid > * if addr bit is set in mc_status */ > uint64_t mc_misc; > uint64_t mc_ctrl2; > uint64_t mc_tsc; > /* Recovery action is performed per bank */ > struct recovery_action action; > }; > > 2) Below two interfaces are for MCA processing internal use. > a. pre_handler will be called earlier in MCA ISR context, mainly for > early need_reset detection for avoiding log missing (flag MCA_RESET). > Also, pre_handler might be able to find the impacted domain if possible. > b. mca_error_handler is actually a (error_action_index, > recovery_handler pointer) pair. The defined recovery_handler function > performs the actual recovery operations in softIrq context after the > per_bank MCA error matching the corresponding mca_code index. If > pre_handler can''t judge the impacted domain, recovery_handler must figure > it out. > > /* Error has been recovered successfully */ > #define MCA_RECOVERD 0 > /* Error impact one guest as stated in owner field */ > #define MCA_OWNER 1 > /* Error can''t be recovered and need reboot system */ > #define MCA_RESET 2 > /* Error should be handled in softIRQ context */ > #define MCA_MORE_ACTION 3 > > struct mca_handle_result > { > uint32_t flags; > /* Valid only when flags & MCA_OWNER */ > domid_d owner; > /* valid only when flags & MCA_RECOVERD */ > struct recovery_action *action; > }; > > struct mca_error_handler > { > /* > * Assume we will need only architecture defined code. If the index > can''t be setup by * mca_code, we will add a function to do the (index, > recovery_handler) mapping check. * This mca_code represents the recovery > handler pointer index for identifying this * particular error''s > corresponding recover action > */ > uint16_t mca_code; > > /* Handler to be called in softIRQ handler context */ > int recovery_handler(struct mcinfo_bank *bank, > struct mcinfo_global *global, > struct mcinfo_extended *extention, > struct mca_handle_result *result); > > }; > > struct mca_error_handler intel_mca_handler[] > { > .... > }; > > struct mca_error_handler amd_mca_handler[] > { > .... > }; > > > /* HandlVer to be called in MCA ISR in MCA context */ > int intel_mca_pre_handler(struct cpu_user_regs *regs, > struct mca_handle_result *result); > > int amd_mca_pre_handler(struct cpu_user_regs *regs, > struct mca_handle_result *result); > > Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote: > > Jiang, Yunhong wrote: > >> Frank/Christopher, can you please give more comments for it, or you are > >> OK with this? For the action reporting mechanism, we will send out a > >> proposal for review soon. > > > > I''m ok with this. We need a little more information on the AMD > > mechanism, but it seems to me that we can fit this in. > > > > Sometime this week, I''ll also send out the last of our changes that > > haven''t been sent upstream to xen-unstable yet. Maybe we can combine > > some things in to one patch, like the telemetry handling changes that > > Gavin did. The other changes are error injection (for debugging) and > > panic crash dump support for our FMA tools, but those are probably only > > interesting to us. > > > > - Frank-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-05 15:19 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph Egger <mailto:Christoph.Egger@amd.com> wrote:> MC_ACT_CACHE_SHIRNK <-- typo. should be MC_ACT_CACHE_SHRINKAhh, yes, I will fix it.> > The L3 cache index disable feature works like this: > > You read the bits 17:6 from the MSR 0xC0000408 (which is MC4_MISC1) > and write it into the index field. This MSR does not belong to > the standard > mc bank data and is therefore provided by mcinfo_extended. > The index field are the bits 11:0 of the PCI function 3 register "L3 Cache > Index Disable".So what''s the offset of "L3 Cache Index Disable"? Is it in 256 byte or 4K byte? For the PCI access, I''d prefer to have xen to control all these, i.e. even if dom0 want to disable the L3 cache, it is done through a hypercall. The reason is, Xen control the CPU, so keep it in Xen will make things simpler. Of course, it is ok for me too, if you want to keep Xen for #MC handler and Dom0 for CE handler.> > Why is the recover action bound to the bank ? > I would like to see a struct mcinfo_recover rather extending > struct mcinfo_bank. That gives us flexibility.I''d get input from Frank or Gavin. Place mcinfo_recover in mcinfo_back has advantage of keep connection of the error source and the action, but it do make the mcinfo_bank more complex. Or we can keep the cpu/bank information in the mcinfo_recover also, so that we keep the flexibility and don''t lose the connection. Thanks Yunhong Jiang> > Christoph > > > On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote: >> Christoph/Frank, Followed is the interface definition, please have a look. >> >> Thanks >> Yunhong Jiang >> >> 1) Interface between Xen/dom0 for passing xen''s recovery action information >> to dom0. Usage model: After offlining broken page, Xen might pass its >> page-offline recovery action result information to dom0. Dom0 will save the >> information in non-volatile memory for further proactive actions, such as >> offlining the easy-broken page early when doing next reboot. >> >> >> struct page_offline_action >> { >> /* Params for passing the offlined page number to DOM0 */ uint64_t >> mfn; uint64_t status; /* Similar to page offline hypercall */ }; >> >> struct cpu_offline_action >> { >> /* Params for passing the identity of the offlined CPU to DOM0 */ >> uint32_t mc_socketid; uint16_t mc_coreid; >> uint16_t mc_core_threadid; >> }; >> >> struct cache_shrink_action >> { >> /* TBD, Christoph, please fill it */ >> }; >> >> /* Recover action flags, giving recovery result information to guest */ >> /* Recovery successfully after taking certain recovery actions below */ >> #define REC_ACT_RECOVERED (0x1 << 0) >> /* For solaris''s usage that dom0 will take ownership when crash */ >> #define REC_ACT_RESET (0x1 << 2) >> /* No action is performed by XEN */ >> #define REC_ACT_INFO (0x1 << 3) >> >> /* Recover action type definition, valid only when flags & >> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1 >> #define MC_ACT_CPU_OFFLINE 2 >> #define MC_ACT_CACHE_SHIRNK 3 >> >> struct recovery_action >> { >> uint8_t flags; >> uint8_t action_type; >> union >> { >> struct page_offline_action page_retire; >> struct cpu_offline_action cpu_offline; >> struct cache_shrink_action cache_shrink; >> uint8_t pad[MAX_ACTION_SIZE]; >> } action_info; >> } >> >> struct mcinfo_bank { >> struct mcinfo_common common; >> >> uint16_t mc_bank; /* bank nr */ >> uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0 >> * and if mc_addr is valid. Never valid on DomU. */ >> uint64_t mc_status; /* bank status */ >> uint64_t mc_addr; /* bank address, only valid >> * if addr bit is set in mc_status */ uint64_t >> mc_misc; uint64_t mc_ctrl2; >> uint64_t mc_tsc; >> /* Recovery action is performed per bank */ >> struct recovery_action action; >> }; >> >> 2) Below two interfaces are for MCA processing internal use. >> a. pre_handler will be called earlier in MCA ISR context, mainly for >> early need_reset detection for avoiding log missing (flag MCA_RESET). >> Also, pre_handler might be able to find the impacted domain if possible. >> b. mca_error_handler is actually a (error_action_index, >> recovery_handler pointer) pair. The defined recovery_handler function >> performs the actual recovery operations in softIrq context after the >> per_bank MCA error matching the corresponding mca_code index. If >> pre_handler can''t judge the impacted domain, recovery_handler must figure >> it out. >> >> /* Error has been recovered successfully */ >> #define MCA_RECOVERD 0 >> /* Error impact one guest as stated in owner field */ #define MCA_OWNER >> 1 /* Error can''t be recovered and need reboot system */ #define MCA_RESET >> 2 /* Error should be handled in softIRQ context */ >> #define MCA_MORE_ACTION 3 >> >> struct mca_handle_result >> { >> uint32_t flags; >> /* Valid only when flags & MCA_OWNER */ >> domid_d owner; >> /* valid only when flags & MCA_RECOVERD */ >> struct recovery_action *action; >> }; >> >> struct mca_error_handler >> { >> /* >> * Assume we will need only architecture defined code. If the index >> can''t be setup by * mca_code, we will add a function to do the (index, >> recovery_handler) mapping check. * This mca_code represents the recovery >> handler pointer index for identifying this * particular error''s >> corresponding recover action */ >> uint16_t mca_code; >> >> /* Handler to be called in softIRQ handler context */ >> int recovery_handler(struct mcinfo_bank *bank, >> struct mcinfo_global *global, >> struct mcinfo_extended *extention, >> struct mca_handle_result *result); >> >> }; >> >> struct mca_error_handler intel_mca_handler[] >> { >> .... >> }; >> >> struct mca_error_handler amd_mca_handler[] >> { >> .... >> }; >> >> >> /* HandlVer to be called in MCA ISR in MCA context */ >> int intel_mca_pre_handler(struct cpu_user_regs *regs, >> struct mca_handle_result *result); >> >> int amd_mca_pre_handler(struct cpu_user_regs *regs, >> struct mca_handle_result *result); >> >> Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote: >>> Jiang, Yunhong wrote: >>>> Frank/Christopher, can you please give more comments for it, or you are >>>> OK with this? For the action reporting mechanism, we will send out a >>>> proposal for review soon. >>> >>> I''m ok with this. We need a little more information on the AMD >>> mechanism, but it seems to me that we can fit this in. >>> >>> Sometime this week, I''ll also send out the last of our changes that >>> haven''t been sent upstream to xen-unstable yet. Maybe we can combine >>> some things in to one patch, like the telemetry handling changes that >>> Gavin did. The other changes are error injection (for debugging) and >>> panic crash dump support for our FMA tools, but those are probably only >>> interesting to us. >>> >>> - Frank > > > > -- > ---to satisfy European Law for business letters: > Advanced Micro Devices GmbH > Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni > Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > Registergericht Muenchen, HRB Nr. 43632_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Mar-05 17:28 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Thursday 05 March 2009 16:19:40 Jiang, Yunhong wrote:> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote: > > MC_ACT_CACHE_SHIRNK <-- typo. should be MC_ACT_CACHE_SHRINK > > Ahh, yes, I will fix it. > > > The L3 cache index disable feature works like this: > > > > You read the bits 17:6 from the MSR 0xC0000408 (which is MC4_MISC1) > > and write it into the index field. This MSR does not belong to > > the standard > > mc bank data and is therefore provided by mcinfo_extended. > > The index field are the bits 11:0 of the PCI function 3 register "L3 > > Cache Index Disable". > > So what''s the offset of "L3 Cache Index Disable"? Is it in 256 byte or 4K > byte?Sorry, which offset do you mean ?> > For the PCI access, I''d prefer to have xen to control all these, i.e. even > if dom0 want to disable the L3 cache, it is done through a hypercall. The > reason is, Xen control the CPU, so keep it in Xen will make things simpler. > > Of course, it is ok for me too, if you want to keep Xen for #MC handler and > Dom0 for CE handler.We still need to define the rules to prevent interferes and clarify how to deal with Dom0/DomU going wild and breaking the rules.> > Why is the recover action bound to the bank ? > > I would like to see a struct mcinfo_recover rather extending > > struct mcinfo_bank. That gives us flexibility. > > I''d get input from Frank or Gavin. Place mcinfo_recover in mcinfo_back has > advantage of keep connection of the error source and the action, but it do > make the mcinfo_bank more complex. Or we can keep the cpu/bank information > in the mcinfo_recover also, so that we keep the flexibility and don''t lose > the connection.From your suggestions I prefer the last one, but is still limited due to the assumption that each struct mcinfo_bank and each struct mcinfo_extended stands for exactly one error. This assumption doesn''t cover follow-up errors which may be needed to determine the real root cause. Some of them may even be ignored depending on what is going on. Christoph> > Thanks > Yunhong Jiang > > > Christoph > > > > On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote: > >> Christoph/Frank, Followed is the interface definition, please have a > >> look. > >> > >> Thanks > >> Yunhong Jiang > >> > >> 1) Interface between Xen/dom0 for passing xen''s recovery action > >> information to dom0. Usage model: After offlining broken page, Xen might > >> pass its page-offline recovery action result information to dom0. Dom0 > >> will save the information in non-volatile memory for further proactive > >> actions, such as offlining the easy-broken page early when doing next > >> reboot. > >> > >> > >> struct page_offline_action > >> { > >> /* Params for passing the offlined page number to DOM0 */ > >> uint64_t mfn; uint64_t status; /* Similar to page offline hypercall */ > >> }; > >> > >> struct cpu_offline_action > >> { > >> /* Params for passing the identity of the offlined CPU to DOM0 */ > >> uint32_t mc_socketid; uint16_t mc_coreid; > >> uint16_t mc_core_threadid; > >> }; > >> > >> struct cache_shrink_action > >> { > >> /* TBD, Christoph, please fill it */ > >> }; > >> > >> /* Recover action flags, giving recovery result information to guest */ > >> /* Recovery successfully after taking certain recovery actions below */ > >> #define REC_ACT_RECOVERED (0x1 << 0) > >> /* For solaris''s usage that dom0 will take ownership when crash */ > >> #define REC_ACT_RESET (0x1 << 2) > >> /* No action is performed by XEN */ > >> #define REC_ACT_INFO (0x1 << 3) > >> > >> /* Recover action type definition, valid only when flags & > >> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1 > >> #define MC_ACT_CPU_OFFLINE 2 > >> #define MC_ACT_CACHE_SHIRNK 3 > >> > >> struct recovery_action > >> { > >> uint8_t flags; > >> uint8_t action_type; > >> union > >> { > >> struct page_offline_action page_retire; > >> struct cpu_offline_action cpu_offline; > >> struct cache_shrink_action cache_shrink; > >> uint8_t pad[MAX_ACTION_SIZE]; > >> } action_info; > >> } > >> > >> struct mcinfo_bank { > >> struct mcinfo_common common; > >> > >> uint16_t mc_bank; /* bank nr */ > >> uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on > >> dom0 * and if mc_addr is valid. Never valid on DomU. */ uint64_t > >> mc_status; /* bank status */ > >> uint64_t mc_addr; /* bank address, only valid > >> * if addr bit is set in mc_status */ > >> uint64_t mc_misc; uint64_t mc_ctrl2; > >> uint64_t mc_tsc; > >> /* Recovery action is performed per bank */ > >> struct recovery_action action; > >> }; > >> > >> 2) Below two interfaces are for MCA processing internal use. > >> a. pre_handler will be called earlier in MCA ISR context, mainly for > >> early need_reset detection for avoiding log missing (flag MCA_RESET). > >> Also, pre_handler might be able to find the impacted domain if possible. > >> b. mca_error_handler is actually a (error_action_index, > >> recovery_handler pointer) pair. The defined recovery_handler function > >> performs the actual recovery operations in softIrq context after the > >> per_bank MCA error matching the corresponding mca_code index. If > >> pre_handler can''t judge the impacted domain, recovery_handler must > >> figure it out. > >> > >> /* Error has been recovered successfully */ > >> #define MCA_RECOVERD 0 > >> /* Error impact one guest as stated in owner field */ #define MCA_OWNER > >> 1 /* Error can''t be recovered and need reboot system */ #define > >> MCA_RESET 2 /* Error should be handled in softIRQ context */ > >> #define MCA_MORE_ACTION 3 > >> > >> struct mca_handle_result > >> { > >> uint32_t flags; > >> /* Valid only when flags & MCA_OWNER */ > >> domid_d owner; > >> /* valid only when flags & MCA_RECOVERD */ > >> struct recovery_action *action; > >> }; > >> > >> struct mca_error_handler > >> { > >> /* > >> * Assume we will need only architecture defined code. If the index > >> can''t be setup by * mca_code, we will add a function to do the (index, > >> recovery_handler) mapping check. * This mca_code represents the recovery > >> handler pointer index for identifying this * particular error''s > >> corresponding recover action */ > >> uint16_t mca_code; > >> > >> /* Handler to be called in softIRQ handler context */ > >> int recovery_handler(struct mcinfo_bank *bank, > >> struct mcinfo_global *global, > >> struct mcinfo_extended *extention, > >> struct mca_handle_result *result); > >> > >> }; > >> > >> struct mca_error_handler intel_mca_handler[] > >> { > >> .... > >> }; > >> > >> struct mca_error_handler amd_mca_handler[] > >> { > >> .... > >> }; > >> > >> > >> /* HandlVer to be called in MCA ISR in MCA context */ > >> int intel_mca_pre_handler(struct cpu_user_regs *regs, > >> struct mca_handle_result *result); > >> > >> int amd_mca_pre_handler(struct cpu_user_regs *regs, > >> struct mca_handle_result *result); > >> > >> Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote: > >>> Jiang, Yunhong wrote: > >>>> Frank/Christopher, can you please give more comments for it, or you > >>>> are OK with this? For the action reporting mechanism, we will send out > >>>> a proposal for review soon. > >>> > >>> I''m ok with this. We need a little more information on the AMD > >>> mechanism, but it seems to me that we can fit this in. > >>> > >>> Sometime this week, I''ll also send out the last of our changes that > >>> haven''t been sent upstream to xen-unstable yet. Maybe we can combine > >>> some things in to one patch, like the telemetry handling changes that > >>> Gavin did. The other changes are error injection (for debugging) and > >>> panic crash dump support for our FMA tools, but those are probably only > >>> interesting to us. > >>> > >>> - Frank > > > > -- > > ---to satisfy European Law for business letters: > > Advanced Micro Devices GmbH > > Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > > Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni > > Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > > Registergericht Muenchen, HRB Nr. 43632-- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-06 02:11 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph Egger <mailto:Christoph.Egger@amd.com> wrote:> On Thursday 05 March 2009 16:19:40 Jiang, Yunhong wrote: >> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote: >>> MC_ACT_CACHE_SHIRNK <-- typo. should be MC_ACT_CACHE_SHRINK >> >> Ahh, yes, I will fix it. >> >>> The L3 cache index disable feature works like this: >>> >>> You read the bits 17:6 from the MSR 0xC0000408 (which is MC4_MISC1) >>> and write it into the index field. This MSR does not belong to >>> the standard >>> mc bank data and is therefore provided by mcinfo_extended. >>> The index field are the bits 11:0 of the PCI function 3 register "L3 >>> Cache Index Disable". >> >> So what''s the offset of "L3 Cache Index Disable"? Is it in 256 byte or 4K >> byte? > > Sorry, which offset do you mean ?I mean the offset of this register in the PCI function''s configuration space. You know for a PCI device, it has 256 byte configuration register while PCI-E device has 4K configuration register. Currently xen can access the 256 byte config register already, however, to support 4K range, it requires more stuff, like mmconfig sparse etc. That''s the reason I ask the offset of this register.> >> >> For the PCI access, I''d prefer to have xen to control all these, i.e. even >> if dom0 want to disable the L3 cache, it is done through a hypercall. The >> reason is, Xen control the CPU, so keep it in Xen will make things simpler. >> >> Of course, it is ok for me too, if you want to keep Xen for #MC handler and >> Dom0 for CE handler. > > We still need to define the rules to prevent interferes and > clarify how to > deal with Dom0/DomU going wild and breaking the rules.As discussed previously, we don''t need concern about DomU, all configuration space access from domU will be intercepted by dom0. For Dom0, since currently all PCI access to 0xcf8/cfc will be intercepted by Xen, so Xen can do checking. We can achieve same checking for mmconfig if remove that range from dom0. But I have to say I''m not sure if we do need concern too much what will happen when dom0 going wild ( after all, a crash in dom0 will lost everything), especially interfere on such access will not cause security issue (please correct me if I''m wrong ).> >>> Why is the recover action bound to the bank ? >>> I would like to see a struct mcinfo_recover rather extending >>> struct mcinfo_bank. That gives us flexibility. >> >> I''d get input from Frank or Gavin. Place mcinfo_recover in mcinfo_back has >> advantage of keep connection of the error source and the action, but it do >> make the mcinfo_bank more complex. Or we can keep the cpu/bank information >> in the mcinfo_recover also, so that we keep the flexibility and don''t lose >> the connection. > > From your suggestions I prefer the last one, but is still limited due > to the assumption that each struct mcinfo_bank and each struct > mcinfo_extended stands for exactly one error. > > This assumption doesn''t cover follow-up errors which may be needed to > determine the real root cause. Some of them may even be ignored > depending on what is going on.I think the assumption here is a recover action will be triggered only by one bank. For example, we offline page because one MC bank tell us that page is broken. The "follow-up errors" is something interesting to me, do you have any example? It''s ok for us to not include the back information if there are such requirement. Thanks Yunhong Jiang> > Christoph > >> >> Thanks >> Yunhong Jiang >> >>> Christoph >>> >>> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote: >>>> Christoph/Frank, Followed is the interface definition, please have a >>>> look. >>>> >>>> Thanks >>>> Yunhong Jiang >>>> >>>> 1) Interface between Xen/dom0 for passing xen''s recovery action >>>> information to dom0. Usage model: After offlining broken page, Xen might >>>> pass its page-offline recovery action result information to dom0. Dom0 >>>> will save the information in non-volatile memory for further proactive >>>> actions, such as offlining the easy-broken page early when doing next >>>> reboot. >>>> >>>> >>>> struct page_offline_action >>>> { >>>> /* Params for passing the offlined page number to DOM0 */ >>>> uint64_t mfn; uint64_t status; /* Similar to page offline hypercall */ }; >>>> >>>> struct cpu_offline_action >>>> { >>>> /* Params for passing the identity of the offlined CPU to DOM0 */ >>>> uint32_t mc_socketid; uint16_t mc_coreid; >>>> uint16_t mc_core_threadid; >>>> }; >>>> >>>> struct cache_shrink_action >>>> { >>>> /* TBD, Christoph, please fill it */ >>>> }; >>>> >>>> /* Recover action flags, giving recovery result information to guest */ >>>> /* Recovery successfully after taking certain recovery actions below */ >>>> #define REC_ACT_RECOVERED (0x1 << 0) >>>> /* For solaris''s usage that dom0 will take ownership when crash */ >>>> #define REC_ACT_RESET (0x1 << 2) >>>> /* No action is performed by XEN */ >>>> #define REC_ACT_INFO (0x1 << 3) >>>> >>>> /* Recover action type definition, valid only when flags & >>>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1 >>>> #define MC_ACT_CPU_OFFLINE 2 >>>> #define MC_ACT_CACHE_SHIRNK 3 >>>> >>>> struct recovery_action >>>> { >>>> uint8_t flags; >>>> uint8_t action_type; >>>> union >>>> { >>>> struct page_offline_action page_retire; >>>> struct cpu_offline_action cpu_offline; >>>> struct cache_shrink_action cache_shrink; >>>> uint8_t pad[MAX_ACTION_SIZE]; >>>> } action_info; >>>> } >>>> >>>> struct mcinfo_bank { >>>> struct mcinfo_common common; >>>> >>>> uint16_t mc_bank; /* bank nr */ >>>> uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on >>>> dom0 * and if mc_addr is valid. Never valid on DomU. */ uint64_t >>>> mc_status; /* bank status */ uint64_t mc_addr; /* bank address, >>>> only valid * if addr bit is set in mc_status */ >>>> uint64_t mc_misc; uint64_t mc_ctrl2; >>>> uint64_t mc_tsc; >>>> /* Recovery action is performed per bank */ >>>> struct recovery_action action; >>>> }; >>>> >>>> 2) Below two interfaces are for MCA processing internal use. >>>> a. pre_handler will be called earlier in MCA ISR context, mainly for >>>> early need_reset detection for avoiding log missing (flag MCA_RESET). >>>> Also, pre_handler might be able to find the impacted domain if possible. >>>> b. mca_error_handler is actually a (error_action_index, >>>> recovery_handler pointer) pair. The defined recovery_handler function >>>> performs the actual recovery operations in softIrq context after the >>>> per_bank MCA error matching the corresponding mca_code index. If >>>> pre_handler can''t judge the impacted domain, recovery_handler must >>>> figure it out. >>>> >>>> /* Error has been recovered successfully */ >>>> #define MCA_RECOVERD 0 >>>> /* Error impact one guest as stated in owner field */ #define MCA_OWNER >>>> 1 /* Error can''t be recovered and need reboot system */ #define >>>> MCA_RESET 2 /* Error should be handled in softIRQ context */ #define >>>> MCA_MORE_ACTION 3 >>>> >>>> struct mca_handle_result >>>> { >>>> uint32_t flags; >>>> /* Valid only when flags & MCA_OWNER */ >>>> domid_d owner; >>>> /* valid only when flags & MCA_RECOVERD */ >>>> struct recovery_action *action; >>>> }; >>>> >>>> struct mca_error_handler >>>> { >>>> /* >>>> * Assume we will need only architecture defined code. If the index >>>> can''t be setup by * mca_code, we will add a function to do the (index, >>>> recovery_handler) mapping check. * This mca_code represents the recovery >>>> handler pointer index for identifying this * particular error''s >>>> corresponding recover action */ >>>> uint16_t mca_code; >>>> >>>> /* Handler to be called in softIRQ handler context */ >>>> int recovery_handler(struct mcinfo_bank *bank, >>>> struct mcinfo_global *global, >>>> struct mcinfo_extended *extention, >>>> struct mca_handle_result *result); >>>> >>>> }; >>>> >>>> struct mca_error_handler intel_mca_handler[] >>>> { >>>> .... >>>> }; >>>> >>>> struct mca_error_handler amd_mca_handler[] >>>> { >>>> .... >>>> }; >>>> >>>> >>>> /* HandlVer to be called in MCA ISR in MCA context */ >>>> int intel_mca_pre_handler(struct cpu_user_regs *regs, >>>> struct mca_handle_result *result); >>>> >>>> int amd_mca_pre_handler(struct cpu_user_regs *regs, >>>> struct mca_handle_result *result); >>>> >>>> Frank.Vanderlinden@Sun.COM > <mailto:Frank.Vanderlinden@Sun.COM> wrote: >>>>> Jiang, Yunhong wrote: >>>>>> Frank/Christopher, can you please give more comments for it, or you >>>>>> are OK with this? For the action reporting mechanism, we will send out >>>>>> a proposal for review soon. >>>>> >>>>> I''m ok with this. We need a little more information on the AMD >>>>> mechanism, but it seems to me that we can fit this in. >>>>> >>>>> Sometime this week, I''ll also send out the last of our changes that >>>>> haven''t been sent upstream to xen-unstable yet. Maybe we can combine >>>>> some things in to one patch, like the telemetry handling changes that >>>>> Gavin did. The other changes are error injection (for debugging) and >>>>> panic crash dump support for our FMA tools, but those are probably only >>>>> interesting to us. >>>>> >>>>> - Frank >>> >>> -- >>> ---to satisfy European Law for business letters: >>> Advanced Micro Devices GmbH >>> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >>> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni >>> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >>> Registergericht Muenchen, HRB Nr. 43632 > > > > -- > ---to satisfy European Law for business letters: > Advanced Micro Devices GmbH > Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen > Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni > Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen > Registergericht Muenchen, HRB Nr. 43632_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-10 01:19 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph/Frank, do you have any comments? Thanks Yunhong Jiang Jiang, Yunhong <> wrote:> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote: >> On Thursday 05 March 2009 16:19:40 Jiang, Yunhong wrote: >>> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote: >>>> MC_ACT_CACHE_SHIRNK <-- typo. should be MC_ACT_CACHE_SHRINK >>> >>> Ahh, yes, I will fix it. >>> >>>> The L3 cache index disable feature works like this: >>>> >>>> You read the bits 17:6 from the MSR 0xC0000408 (which is MC4_MISC1) >>>> and write it into the index field. This MSR does not belong to >>>> the standard >>>> mc bank data and is therefore provided by mcinfo_extended. >>>> The index field are the bits 11:0 of the PCI function 3 register "L3 >>>> Cache Index Disable". >>> >>> So what''s the offset of "L3 Cache Index Disable"? Is it in 256 byte or 4K >>> byte? >> >> Sorry, which offset do you mean ? > > I mean the offset of this register in the PCI function''s > configuration space. You know for a PCI device, it has 256 > byte configuration register while PCI-E device has 4K > configuration register. > Currently xen can access the 256 byte config register already, > however, to support 4K range, it requires more stuff, like > mmconfig sparse etc. That''s the reason I ask the offset of > this register. > >> >>> >>> For the PCI access, I''d prefer to have xen to control all these, i.e. even >>> if dom0 want to disable the L3 cache, it is done through a hypercall. The >>> reason is, Xen control the CPU, so keep it in Xen will make things >>> simpler. >>> >>> Of course, it is ok for me too, if you want to keep Xen for #MC handler >>> and Dom0 for CE handler. >> >> We still need to define the rules to prevent interferes and >> clarify how to >> deal with Dom0/DomU going wild and breaking the rules. > > As discussed previously, we don''t need concern about DomU, > all configuration space access from domU will be intercepted by dom0. > > For Dom0, since currently all PCI access to 0xcf8/cfc will be > intercepted by Xen, so Xen can do checking. We can achieve > same checking for mmconfig if remove that range from dom0. But > I have to say I''m not sure if we do need concern too much what > will happen when dom0 going wild ( after all, a crash in dom0 > will lost everything), especially interfere on such access > will not cause security issue (please correct me if I''m wrong ). > >> >>>> Why is the recover action bound to the bank ? >>>> I would like to see a struct mcinfo_recover rather extending >>>> struct mcinfo_bank. That gives us flexibility. >>> >>> I''d get input from Frank or Gavin. Place mcinfo_recover in mcinfo_back has >>> advantage of keep connection of the error source and the action, but it do >>> make the mcinfo_bank more complex. Or we can keep the cpu/bank information >>> in the mcinfo_recover also, so that we keep the flexibility and don''t lose >>> the connection. >> >> From your suggestions I prefer the last one, but is still limited due >> to the assumption that each struct mcinfo_bank and each struct >> mcinfo_extended stands for exactly one error. >> >> This assumption doesn''t cover follow-up errors which may be needed to >> determine the real root cause. Some of them may even be ignored >> depending on what is going on. > > I think the assumption here is a recover action will be > triggered only by one bank. For example, we offline page > because one MC bank tell us that page is broken. > > The "follow-up errors" is something interesting to me, do you > have any example? It''s ok for us to not include the back > information if there are such requirement. > > Thanks > Yunhong Jiang > >> >> Christoph >> >>> >>> Thanks >>> Yunhong Jiang >>> >>>> Christoph >>>> >>>> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote: >>>>> Christoph/Frank, Followed is the interface definition, please have a >>>>> look. >>>>> >>>>> Thanks >>>>> Yunhong Jiang >>>>> >>>>> 1) Interface between Xen/dom0 for passing xen''s recovery action >>>>> information to dom0. Usage model: After offlining broken page, Xen might >>>>> pass its page-offline recovery action result information to dom0. Dom0 >>>>> will save the information in non-volatile memory for further proactive >>>>> actions, such as offlining the easy-broken page early when doing next >>>>> reboot. >>>>> >>>>> >>>>> struct page_offline_action >>>>> { >>>>> /* Params for passing the offlined page number to DOM0 */ >>>>> uint64_t mfn; uint64_t status; /* Similar to page offline hypercall */ >>>>> }; >>>>> >>>>> struct cpu_offline_action >>>>> { >>>>> /* Params for passing the identity of the offlined CPU to DOM0 */ >>>>> uint32_t mc_socketid; uint16_t mc_coreid; >>>>> uint16_t mc_core_threadid; >>>>> }; >>>>> >>>>> struct cache_shrink_action >>>>> { >>>>> /* TBD, Christoph, please fill it */ >>>>> }; >>>>> >>>>> /* Recover action flags, giving recovery result information to guest */ >>>>> /* Recovery successfully after taking certain recovery actions below */ >>>>> #define REC_ACT_RECOVERED (0x1 << 0) >>>>> /* For solaris''s usage that dom0 will take ownership when crash */ >>>>> #define REC_ACT_RESET (0x1 << 2) >>>>> /* No action is performed by XEN */ >>>>> #define REC_ACT_INFO (0x1 << 3) >>>>> >>>>> /* Recover action type definition, valid only when flags & >>>>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1 >>>>> #define MC_ACT_CPU_OFFLINE 2 >>>>> #define MC_ACT_CACHE_SHIRNK 3 >>>>> >>>>> struct recovery_action >>>>> { >>>>> uint8_t flags; >>>>> uint8_t action_type; >>>>> union >>>>> { >>>>> struct page_offline_action page_retire; >>>>> struct cpu_offline_action cpu_offline; >>>>> struct cache_shrink_action cache_shrink; >>>>> uint8_t pad[MAX_ACTION_SIZE]; >>>>> } action_info; >>>>> } >>>>> >>>>> struct mcinfo_bank { >>>>> struct mcinfo_common common; >>>>> >>>>> uint16_t mc_bank; /* bank nr */ >>>>> uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on >>>>> dom0 * and if mc_addr is valid. Never valid on DomU. */ uint64_t >>>>> mc_status; /* bank status */ uint64_t mc_addr; /* bank address, >>>>> only valid * if addr bit is set in mc_status */ >>>>> uint64_t mc_misc; uint64_t mc_ctrl2; >>>>> uint64_t mc_tsc; >>>>> /* Recovery action is performed per bank */ >>>>> struct recovery_action action; >>>>> }; >>>>> >>>>> 2) Below two interfaces are for MCA processing internal use. >>>>> a. pre_handler will be called earlier in MCA ISR context, mainly for >>>>> early need_reset detection for avoiding log missing (flag MCA_RESET). >>>>> Also, pre_handler might be able to find the impacted domain if possible. >>>>> b. mca_error_handler is actually a (error_action_index, >>>>> recovery_handler pointer) pair. The defined recovery_handler function >>>>> performs the actual recovery operations in softIrq context after the >>>>> per_bank MCA error matching the corresponding mca_code index. If >>>>> pre_handler can''t judge the impacted domain, recovery_handler must >>>>> figure it out. >>>>> >>>>> /* Error has been recovered successfully */ >>>>> #define MCA_RECOVERD 0 >>>>> /* Error impact one guest as stated in owner field */ #define MCA_OWNER >>>>> 1 /* Error can''t be recovered and need reboot system */ #define >>>>> MCA_RESET 2 /* Error should be handled in softIRQ context */ #define >>>>> MCA_MORE_ACTION 3 >>>>> >>>>> struct mca_handle_result >>>>> { >>>>> uint32_t flags; >>>>> /* Valid only when flags & MCA_OWNER */ >>>>> domid_d owner; >>>>> /* valid only when flags & MCA_RECOVERD */ >>>>> struct recovery_action *action; >>>>> }; >>>>> >>>>> struct mca_error_handler >>>>> { >>>>> /* >>>>> * Assume we will need only architecture defined code. If the index >>>>> can''t be setup by * mca_code, we will add a function to do the (index, >>>>> recovery_handler) mapping check. * This mca_code represents the recovery >>>>> handler pointer index for identifying this * particular error''s >>>>> corresponding recover action */ >>>>> uint16_t mca_code; >>>>> >>>>> /* Handler to be called in softIRQ handler context */ >>>>> int recovery_handler(struct mcinfo_bank *bank, >>>>> struct mcinfo_global *global, >>>>> struct mcinfo_extended *extention, >>>>> struct mca_handle_result *result); >>>>> >>>>> }; >>>>> >>>>> struct mca_error_handler intel_mca_handler[] >>>>> { >>>>> .... >>>>> }; >>>>> >>>>> struct mca_error_handler amd_mca_handler[] >>>>> { >>>>> .... >>>>> }; >>>>> >>>>> >>>>> /* HandlVer to be called in MCA ISR in MCA context */ >>>>> int intel_mca_pre_handler(struct cpu_user_regs *regs, >>>>> struct mca_handle_result *result); >>>>> >>>>> int amd_mca_pre_handler(struct cpu_user_regs *regs, >>>>> struct mca_handle_result *result); >>>>> >>>>> Frank.Vanderlinden@Sun.COM >> <mailto:Frank.Vanderlinden@Sun.COM> wrote: >>>>>> Jiang, Yunhong wrote: >>>>>>> Frank/Christopher, can you please give more comments for it, or you >>>>>>> are OK with this? For the action reporting mechanism, we will send out >>>>>>> a proposal for review soon. >>>>>> >>>>>> I''m ok with this. We need a little more information on the AMD >>>>>> mechanism, but it seems to me that we can fit this in. >>>>>> >>>>>> Sometime this week, I''ll also send out the last of our changes that >>>>>> haven''t been sent upstream to xen-unstable yet. Maybe we can combine >>>>>> some things in to one patch, like the telemetry handling changes that >>>>>> Gavin did. The other changes are error injection (for debugging) and >>>>>> panic crash dump support for our FMA tools, but those are probably >>>>>> only interesting to us. >>>>>> >>>>>> - Frank >>>> >>>> -- >>>> ---to satisfy European Law for business letters: >>>> Advanced Micro Devices GmbH >>>> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >>>> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni >>>> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >>>> Registergericht Muenchen, HRB Nr. 43632 >> >> >> >> -- >> ---to satisfy European Law for business letters: >> Advanced Micro Devices GmbH >> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni >> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >> Registergericht Muenchen, HRB Nr. 43632_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Christoph Egger
2009-Mar-10 19:08 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
On Tuesday 10 March 2009 02:19:04 Jiang, Yunhong wrote:> Christoph/Frank, do you have any comments? > > Thanks > Yunhong Jiang > > Jiang, Yunhong <> wrote: > > Christoph Egger <mailto:Christoph.Egger@amd.com> wrote: > >> On Thursday 05 March 2009 16:19:40 Jiang, Yunhong wrote: > >>> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote: > >>>> MC_ACT_CACHE_SHIRNK <-- typo. should be MC_ACT_CACHE_SHRINK > >>> > >>> Ahh, yes, I will fix it. > >>> > >>>> The L3 cache index disable feature works like this: > >>>> > >>>> You read the bits 17:6 from the MSR 0xC0000408 (which is MC4_MISC1) > >>>> and write it into the index field. This MSR does not belong to > >>>> the standard > >>>> mc bank data and is therefore provided by mcinfo_extended. > >>>> The index field are the bits 11:0 of the PCI function 3 register "L3 > >>>> Cache Index Disable". > >>> > >>> So what''s the offset of "L3 Cache Index Disable"? Is it in 256 byte or > >>> 4K byte? > >> > >> Sorry, which offset do you mean ? > > > > I mean the offset of this register in the PCI function''s > > configuration space. You know for a PCI device, it has 256 > > byte configuration register while PCI-E device has 4K > > configuration register. > > Currently xen can access the 256 byte config register already, > > however, to support 4K range, it requires more stuff, like > > mmconfig sparse etc. That''s the reason I ask the offset of > > this register.Ah, I see. The registers of our memory controller are in the PCI config space. It''s no PCI-E device.> >>> For the PCI access, I''d prefer to have xen to control all these, i.e. > >>> even if dom0 want to disable the L3 cache, it is done through a > >>> hypercall. The reason is, Xen control the CPU, so keep it in Xen will > >>> make things simpler. > >>> > >>> Of course, it is ok for me too, if you want to keep Xen for #MC handler > >>> and Dom0 for CE handler. > >> > >> We still need to define the rules to prevent interferes and > >> clarify how to > >> deal with Dom0/DomU going wild and breaking the rules. > > > > As discussed previously, we don''t need concern about DomU, > > all configuration space access from domU will be intercepted by dom0. > > > > For Dom0, since currently all PCI access to 0xcf8/cfc will be > > intercepted by Xen, so Xen can do checking. We can achieve > > same checking for mmconfig if remove that range from dom0. But > > I have to say I''m not sure if we do need concern too much what > > will happen when dom0 going wild ( after all, a crash in dom0 > > will lost everything), especially interfere on such access > > will not cause security issue (please correct me if I''m wrong ).This sounds like an assumption that an IOMMU is always available.> >>>> Why is the recover action bound to the bank ? > >>>> I would like to see a struct mcinfo_recover rather extending > >>>> struct mcinfo_bank. That gives us flexibility. > >>> > >>> I''d get input from Frank or Gavin. Place mcinfo_recover in mcinfo_back > >>> has advantage of keep connection of the error source and the action, > >>> but it do make the mcinfo_bank more complex. Or we can keep the > >>> cpu/bank information in the mcinfo_recover also, so that we keep the > >>> flexibility and don''t lose the connection. > >> > >> From your suggestions I prefer the last one, but is still limited due > >> to the assumption that each struct mcinfo_bank and each struct > >> mcinfo_extended stands for exactly one error. > >> > >> This assumption doesn''t cover follow-up errors which may be needed to > >> determine the real root cause. Some of them may even be ignored > >> depending on what is going on. > > > > I think the assumption here is a recover action will be > > triggered only by one bank. For example, we offline page > > because one MC bank tell us that page is broken.Only if the bank is the one from the memory controller. What if the bank is the Data or Instruction Cache ?> > The "follow-up errors" is something interesting to me, do you > > have any example? It''s ok for us to not include the back > > information if there are such requirement.An error in the Bus Unit can trigger a watchdog timeout and cause a Load-Store error as a "follow-up error". This in turn may trigger another "follow-up error" in the memory controller or in the Data or Instruction Cache depending on what the CPU tries to do. I think, we should mark the ''struct mcinfo_global'' as a kind of header for each error. All following information describe the error (including the follow-up errors) and all recover actions. This gives us the flexibility to get as many information as possible and allows to do as many recover actions as necessary instead of just one. Christoph> >>>> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote: > >>>>> Christoph/Frank, Followed is the interface definition, please have a > >>>>> look. > >>>>> > >>>>> Thanks > >>>>> Yunhong Jiang > >>>>> > >>>>> 1) Interface between Xen/dom0 for passing xen''s recovery action > >>>>> information to dom0. Usage model: After offlining broken page, Xen > >>>>> might pass its page-offline recovery action result information to > >>>>> dom0. Dom0 will save the information in non-volatile memory for > >>>>> further proactive actions, such as offlining the easy-broken page > >>>>> early when doing next reboot. > >>>>> > >>>>> > >>>>> struct page_offline_action > >>>>> { > >>>>> /* Params for passing the offlined page number to DOM0 */ > >>>>> uint64_t mfn; uint64_t status; /* Similar to page offline hypercall > >>>>> */ }; > >>>>> > >>>>> struct cpu_offline_action > >>>>> { > >>>>> /* Params for passing the identity of the offlined CPU to DOM0 */ > >>>>> uint32_t mc_socketid; uint16_t mc_coreid; > >>>>> uint16_t mc_core_threadid; > >>>>> }; > >>>>> > >>>>> struct cache_shrink_action > >>>>> { > >>>>> /* TBD, Christoph, please fill it */ > >>>>> }; > >>>>> > >>>>> /* Recover action flags, giving recovery result information to guest > >>>>> */ /* Recovery successfully after taking certain recovery actions > >>>>> below */ #define REC_ACT_RECOVERED (0x1 << 0) > >>>>> /* For solaris''s usage that dom0 will take ownership when crash */ > >>>>> #define REC_ACT_RESET (0x1 << 2) > >>>>> /* No action is performed by XEN */ > >>>>> #define REC_ACT_INFO (0x1 << 3) > >>>>> > >>>>> /* Recover action type definition, valid only when flags & > >>>>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1 > >>>>> #define MC_ACT_CPU_OFFLINE 2 > >>>>> #define MC_ACT_CACHE_SHIRNK 3 > >>>>> > >>>>> struct recovery_action > >>>>> { > >>>>> uint8_t flags; > >>>>> uint8_t action_type; > >>>>> union > >>>>> { > >>>>> struct page_offline_action page_retire; > >>>>> struct cpu_offline_action cpu_offline; > >>>>> struct cache_shrink_action cache_shrink; > >>>>> uint8_t pad[MAX_ACTION_SIZE]; > >>>>> } action_info; > >>>>> } > >>>>> > >>>>> struct mcinfo_bank { > >>>>> struct mcinfo_common common; > >>>>> > >>>>> uint16_t mc_bank; /* bank nr */ > >>>>> uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on > >>>>> dom0 * and if mc_addr is valid. Never valid on DomU. */ uint64_t > >>>>> mc_status; /* bank status */ uint64_t mc_addr; /* bank address, > >>>>> only valid * if addr bit is set in mc_status > >>>>> */ uint64_t mc_misc; uint64_t mc_ctrl2; > >>>>> uint64_t mc_tsc; > >>>>> /* Recovery action is performed per bank */ > >>>>> struct recovery_action action; > >>>>> }; > >>>>> > >>>>> 2) Below two interfaces are for MCA processing internal use. > >>>>> a. pre_handler will be called earlier in MCA ISR context, mainly > >>>>> for early need_reset detection for avoiding log missing (flag > >>>>> MCA_RESET). Also, pre_handler might be able to find the impacted > >>>>> domain if possible. b. mca_error_handler is actually a > >>>>> (error_action_index, > >>>>> recovery_handler pointer) pair. The defined recovery_handler function > >>>>> performs the actual recovery operations in softIrq context after the > >>>>> per_bank MCA error matching the corresponding mca_code index. If > >>>>> pre_handler can''t judge the impacted domain, recovery_handler must > >>>>> figure it out. > >>>>> > >>>>> /* Error has been recovered successfully */ > >>>>> #define MCA_RECOVERD 0 > >>>>> /* Error impact one guest as stated in owner field */ #define > >>>>> MCA_OWNER 1 /* Error can''t be recovered and need reboot system */ > >>>>> #define MCA_RESET 2 /* Error should be handled in softIRQ context */ > >>>>> #define MCA_MORE_ACTION 3 > >>>>> > >>>>> struct mca_handle_result > >>>>> { > >>>>> uint32_t flags; > >>>>> /* Valid only when flags & MCA_OWNER */ > >>>>> domid_d owner; > >>>>> /* valid only when flags & MCA_RECOVERD */ > >>>>> struct recovery_action *action; > >>>>> }; > >>>>> > >>>>> struct mca_error_handler > >>>>> { > >>>>> /* > >>>>> * Assume we will need only architecture defined code. If the > >>>>> index can''t be setup by * mca_code, we will add a function to do the > >>>>> (index, recovery_handler) mapping check. * This mca_code represents > >>>>> the recovery handler pointer index for identifying this * particular > >>>>> error''s corresponding recover action */ > >>>>> uint16_t mca_code; > >>>>> > >>>>> /* Handler to be called in softIRQ handler context */ > >>>>> int recovery_handler(struct mcinfo_bank *bank, > >>>>> struct mcinfo_global *global, > >>>>> struct mcinfo_extended *extention, > >>>>> struct mca_handle_result *result); > >>>>> > >>>>> }; > >>>>> > >>>>> struct mca_error_handler intel_mca_handler[] > >>>>> { > >>>>> .... > >>>>> }; > >>>>> > >>>>> struct mca_error_handler amd_mca_handler[] > >>>>> { > >>>>> .... > >>>>> }; > >>>>> > >>>>> > >>>>> /* HandlVer to be called in MCA ISR in MCA context */ > >>>>> int intel_mca_pre_handler(struct cpu_user_regs *regs, > >>>>> struct mca_handle_result *result); > >>>>> > >>>>> int amd_mca_pre_handler(struct cpu_user_regs *regs, > >>>>> struct mca_handle_result *result); > >>>>> > >>>>> Frank.Vanderlinden@Sun.COM > >> > >> <mailto:Frank.Vanderlinden@Sun.COM> wrote: > >>>>>> Jiang, Yunhong wrote: > >>>>>>> Frank/Christopher, can you please give more comments for it, or you > >>>>>>> are OK with this? For the action reporting mechanism, we will send > >>>>>>> out a proposal for review soon. > >>>>>> > >>>>>> I''m ok with this. We need a little more information on the AMD > >>>>>> mechanism, but it seems to me that we can fit this in. > >>>>>> > >>>>>> Sometime this week, I''ll also send out the last of our changes that > >>>>>> haven''t been sent upstream to xen-unstable yet. Maybe we can combine > >>>>>> some things in to one patch, like the telemetry handling changes > >>>>>> that Gavin did. The other changes are error injection (for > >>>>>> debugging) and panic crash dump support for our FMA tools, but those > >>>>>> are probably only interesting to us. > >>>>>> > >>>>>> - Frank > >>>> > >>>> ---- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2009-Mar-12 15:52 UTC
RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Christoph, sorry for later response. Please see inline reply.>Ah, I see. The registers of our memory controller are in the >PCI config space. It''s no PCI-E device.That''s great.> >> >>> For the PCI access, I''d prefer to have xen to control >all these, i.e. >> >>> even if dom0 want to disable the L3 cache, it is done through a >> >>> hypercall. The reason is, Xen control the CPU, so keep >it in Xen will >> >>> make things simpler. >> >>> >> >>> Of course, it is ok for me too, if you want to keep Xen >for #MC handler >> >>> and Dom0 for CE handler. >> >> >> >> We still need to define the rules to prevent interferes and >> >> clarify how to >> >> deal with Dom0/DomU going wild and breaking the rules. >> > >> > As discussed previously, we don''t need concern about DomU, >> > all configuration space access from domU will be >intercepted by dom0. >> > >> > For Dom0, since currently all PCI access to 0xcf8/cfc will be >> > intercepted by Xen, so Xen can do checking. We can achieve >> > same checking for mmconfig if remove that range from dom0. But >> > I have to say I''m not sure if we do need concern too much what >> > will happen when dom0 going wild ( after all, a crash in dom0 >> > will lost everything), especially interfere on such access >> > will not cause security issue (please correct me if I''m wrong ). > >This sounds like an assumption that an IOMMU is always available.Xen''s PCI access does not requires IOMMU, it is in arch/x86/pci.c .>> > I think the assumption here is a recover action will be >> > triggered only by one bank. For example, we offline page >> > because one MC bank tell us that page is broken. > >Only if the bank is the one from the memory controller. >What if the bank is the Data or Instruction Cache ? > >> > The "follow-up errors" is something interesting to me, do you >> > have any example? It''s ok for us to not include the back >> > information if there are such requirement. > >An error in the Bus Unit can trigger a watchdog timeout >and cause a Load-Store error as a "follow-up error". This in turn >may trigger another "follow-up error" in the memory controller >or in the Data or Instruction Cache depending on what the CPU >tries to do.Hmm, so will these follow-up error in the same bank or different bank? If in different bank, how can MCE handler knows they are related, or even should MCE handler knows about the relationship (I didn''t find such code in current implementation). Or you mean we need give the relationship because Dom0 need such information?> >I think, we should mark the ''struct mcinfo_global'' as a kind >of header for >each error. All following information describe the error >(including the >follow-up errors) and all recover actions. This gives us the >flexibility >to get as many information as possible and allows to do >as many recover actions as necessary instead of just one.I think your original proposal can also meet such purpose, i.e. include the mc_recover_info and we still need pass all mc_bacnk infor to dom0 for telemetry. If you prefer this one, can you please define the interface? Gavin/Frank, do you have any idea for this changes? Thanks -- Yunhong Jiang> >Christoph > > >> >>>> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote: >> >>>>> Christoph/Frank, Followed is the interface definition, >please have a >> >>>>> look. >> >>>>> >> >>>>> Thanks >> >>>>> Yunhong Jiang >> >>>>> >> >>>>> 1) Interface between Xen/dom0 for passing xen''s recovery action >> >>>>> information to dom0. Usage model: After offlining >broken page, Xen >> >>>>> might pass its page-offline recovery action result >information to >> >>>>> dom0. Dom0 will save the information in non-volatile memory for >> >>>>> further proactive actions, such as offlining the >easy-broken page >> >>>>> early when doing next reboot. >> >>>>> >> >>>>> >> >>>>> struct page_offline_action >> >>>>> { >> >>>>> /* Params for passing the offlined page number to DOM0 */ >> >>>>> uint64_t mfn; uint64_t status; /* Similar to page >offline hypercall >> >>>>> */ }; >> >>>>> >> >>>>> struct cpu_offline_action >> >>>>> { >> >>>>> /* Params for passing the identity of the offlined >CPU to DOM0 */ >> >>>>> uint32_t mc_socketid; uint16_t mc_coreid; >> >>>>> uint16_t mc_core_threadid; >> >>>>> }; >> >>>>> >> >>>>> struct cache_shrink_action >> >>>>> { >> >>>>> /* TBD, Christoph, please fill it */ >> >>>>> }; >> >>>>> >> >>>>> /* Recover action flags, giving recovery result >information to guest >> >>>>> */ /* Recovery successfully after taking certain >recovery actions >> >>>>> below */ #define REC_ACT_RECOVERED (0x1 << 0) >> >>>>> /* For solaris''s usage that dom0 will take ownership >when crash */ >> >>>>> #define REC_ACT_RESET (0x1 << 2) >> >>>>> /* No action is performed by XEN */ >> >>>>> #define REC_ACT_INFO (0x1 << 3) >> >>>>> >> >>>>> /* Recover action type definition, valid only when flags & >> >>>>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1 >> >>>>> #define MC_ACT_CPU_OFFLINE 2 >> >>>>> #define MC_ACT_CACHE_SHIRNK 3 >> >>>>> >> >>>>> struct recovery_action >> >>>>> { >> >>>>> uint8_t flags; >> >>>>> uint8_t action_type; >> >>>>> union >> >>>>> { >> >>>>> struct page_offline_action page_retire; >> >>>>> struct cpu_offline_action cpu_offline; >> >>>>> struct cache_shrink_action cache_shrink; >> >>>>> uint8_t pad[MAX_ACTION_SIZE]; >> >>>>> } action_info; >> >>>>> } >> >>>>> >> >>>>> struct mcinfo_bank { >> >>>>> struct mcinfo_common common; >> >>>>> >> >>>>> uint16_t mc_bank; /* bank nr */ >> >>>>> uint16_t mc_domid; /* Usecase 5: domain referenced >by mc_addr on >> >>>>> dom0 * and if mc_addr is valid. Never valid on DomU. >*/ uint64_t >> >>>>> mc_status; /* bank status */ uint64_t mc_addr; >/* bank address, >> >>>>> only valid * if addr bit is >set in mc_status >> >>>>> */ uint64_t mc_misc; uint64_t mc_ctrl2; >> >>>>> uint64_t mc_tsc; >> >>>>> /* Recovery action is performed per bank */ >> >>>>> struct recovery_action action; >> >>>>> }; >> >>>>> >> >>>>> 2) Below two interfaces are for MCA processing internal use. >> >>>>> a. pre_handler will be called earlier in MCA ISR >context, mainly >> >>>>> for early need_reset detection for avoiding log missing (flag >> >>>>> MCA_RESET). Also, pre_handler might be able to find >the impacted >> >>>>> domain if possible. b. mca_error_handler is actually a >> >>>>> (error_action_index, >> >>>>> recovery_handler pointer) pair. The defined >recovery_handler function >> >>>>> performs the actual recovery operations in softIrq >context after the >> >>>>> per_bank MCA error matching the corresponding mca_code >index. If >> >>>>> pre_handler can''t judge the impacted domain, >recovery_handler must >> >>>>> figure it out. >> >>>>> >> >>>>> /* Error has been recovered successfully */ >> >>>>> #define MCA_RECOVERD 0 >> >>>>> /* Error impact one guest as stated in owner field */ #define >> >>>>> MCA_OWNER 1 /* Error can''t be recovered and need >reboot system */ >> >>>>> #define MCA_RESET 2 /* Error should be handled in >softIRQ context */ >> >>>>> #define MCA_MORE_ACTION 3 >> >>>>> >> >>>>> struct mca_handle_result >> >>>>> { >> >>>>> uint32_t flags; >> >>>>> /* Valid only when flags & MCA_OWNER */ >> >>>>> domid_d owner; >> >>>>> /* valid only when flags & MCA_RECOVERD */ >> >>>>> struct recovery_action *action; >> >>>>> }; >> >>>>> >> >>>>> struct mca_error_handler >> >>>>> { >> >>>>> /* >> >>>>> * Assume we will need only architecture defined >code. If the >> >>>>> index can''t be setup by * mca_code, we will add a >function to do the >> >>>>> (index, recovery_handler) mapping check. * This >mca_code represents >> >>>>> the recovery handler pointer index for identifying >this * particular >> >>>>> error''s corresponding recover action */ >> >>>>> uint16_t mca_code; >> >>>>> >> >>>>> /* Handler to be called in softIRQ handler context */ >> >>>>> int recovery_handler(struct mcinfo_bank *bank, >> >>>>> struct mcinfo_global *global, >> >>>>> struct mcinfo_extended *extention, >> >>>>> struct mca_handle_result *result); >> >>>>> >> >>>>> }; >> >>>>> >> >>>>> struct mca_error_handler intel_mca_handler[] >> >>>>> { >> >>>>> .... >> >>>>> }; >> >>>>> >> >>>>> struct mca_error_handler amd_mca_handler[] >> >>>>> { >> >>>>> .... >> >>>>> }; >> >>>>> >> >>>>> >> >>>>> /* HandlVer to be called in MCA ISR in MCA context */ >> >>>>> int intel_mca_pre_handler(struct cpu_user_regs *regs, >> >>>>> struct >mca_handle_result *result); >> >>>>> >> >>>>> int amd_mca_pre_handler(struct cpu_user_regs *regs, >> >>>>> struct mca_handle_result *result); >> >>>>> >> >>>>> Frank.Vanderlinden@Sun.COM >> >> >> >> <mailto:Frank.Vanderlinden@Sun.COM> wrote: >> >>>>>> Jiang, Yunhong wrote: >> >>>>>>> Frank/Christopher, can you please give more comments >for it, or you >> >>>>>>> are OK with this? For the action reporting >mechanism, we will send >> >>>>>>> out a proposal for review soon. >> >>>>>> >> >>>>>> I''m ok with this. We need a little more information on the AMD >> >>>>>> mechanism, but it seems to me that we can fit this in. >> >>>>>> >> >>>>>> Sometime this week, I''ll also send out the last of >our changes that >> >>>>>> haven''t been sent upstream to xen-unstable yet. Maybe >we can combine >> >>>>>> some things in to one patch, like the telemetry >handling changes >> >>>>>> that Gavin did. The other changes are error injection (for >> >>>>>> debugging) and panic crash dump support for our FMA >tools, but those >> >>>>>> are probably only interesting to us. >> >>>>>> >> >>>>>> - Frank >> >>>> >> >>>> -- > > >-- >---to satisfy European Law for business letters: >Advanced Micro Devices GmbH >Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen >Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni >Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen >Registergericht Muenchen, HRB Nr. 43632 > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Frank van der Linden
2009-Mar-16 16:27 UTC
Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
Jiang, Yunhong wrote: > Christoph Egger wrote:>> I think, we should mark the ''struct mcinfo_global'' as a kind >> of header for >> each error. All following information describe the error >> (including the >> follow-up errors) and all recover actions. This gives us the >> flexibility >> to get as many information as possible and allows to do >> as many recover actions as necessary instead of just one. > > I think your original proposal can also meet such purpose, i.e. include the mc_recover_info and we still need pass all mc_bacnk infor to dom0 for telemetry. If you prefer this one, can you please define the interface? Gavin/Frank, do you have any idea for this changes?Sorry about the slow reply. Our changes to the MCE code (to combine the AMD and Intel code as much as possible, and use a transactional approach to the telemetry) already pretty much uses mc_global as a header. With our code, dom0 retrieves one mcinfo structure, with one global structure (which always comes first, but that''s not required). In other words, using mc_global as kind of a header to the mcinfo data is fine, since we''re already doing that. And, since we''re talking about transactions with one mcinfo structure at a time (with one mc_global structure), the recover_info structures can be separate from the bank structures. - Frank _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel