thr3ads.net - Xen devel - [Xen-devel] [RFC] RAS(Part II)--MCA enalbing in XEN [Feb 2009]

If this information is useful, please help other people find it:
Share via:

Ke, Liping

2009-Feb-16 05:35 UTC

[Xen-devel] [RFC] RAS(Part II)--MCA enalbing in XEN

Hi, all
These patches are for MCA enabling in XEN. It is sent as RFC firstly to collect
some feedbacks for refinement if
needed before the final patch. We also attach one description txt documents for
your reference.
 
Some implementation notes:
1) When error happens, if the error is fatal (pcc = 1) or can''t be
recovered (pcc = 0, yet no good recovery methods),
    for avoiding losing logs in DOM0, we will reset machine immediately. Most of
MCA MSRs are sticky. After reboot,
    MCA polling mechanism will send vIRQ to DOM0 for logging.
2) When MCE# happens, all CPUs enter MCA context. The first CPU who
read&clear the error MSR bank will be this
    MCE# owner. Necessary locks/synchronization will help to judge the owner and
select most severe error.
3) For convenience, we will select the most offending CPU to do most of
processing&recovery job.
4) MCE# happens, we will do three jobs:
    a. Send vIRQ to DOM0 for logging
    b. Send vMCE# to Impacted Guest (Currently Only inject to impacted DOM0)
    c. Guest vMCE MSR virtualization
5) Some further improvement/adds might be done if needed:
    a) Impacted DOM judgement algorithm. 
    b) Now vMCE# injection is controlled by centralized data(vmce_data). The
injection algorithm is a bit complex.
        We might change the algorithm which''s based on PER_DOM data if
you preferred.
        Notes for understanding:
        1) If several banks impact one domain, yet those banks belong to the
same pCPU, it will be injected only once.
        2) If more than one bank impact one domain, yet error banks belong to
different pCPU, ith will be injected nr_num(pCPU) times.
        3) We use centralized data [two arrays impact_domid, impact_cpus map in
vmce_data] to represent the injection
            algorithm. Combined the two array item (idx, impact_domid) and (idx,
impact_cpus) into one item
            (idx, impact_domid, impact_cpus). This item records the
impact_domain id and the error pCPU map
            (Finding UC errors on this CPU which impact this domain). Then, we
can judge how to inject the vMCE
            (domid, impact_times[nr_pCPUs]).
        4) Although data structure is ready, we only inject vMCE# to DOMD0
currently.
    c) Connection with recovery actions (cpu/memory online/offline)
    d) More refines and tests for HVM might be done when needed.
 
Patch Description:
1. basic_mca_support: Enable MCA support in XEN. 
2. vmsr_virtualization: Guest MCE# MSR read/write virtualization support in XEN.
3. mce_dom0: Cooperating with XEN, DOM0 add vIRQ and vMCE# handler. Translate
XEN log to DOM0, re-use
    Linux kernel and MCELOG mechanisms and MCE handler. This is mainly a
demonstration patch.
 
About Test:
We did some internal test and the result is just fine.
 
Any feedback is welcome and thanks a lot for your help! :-)
Regards,
Criping




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Feb-16 13:34 UTC

head link

[Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

To me, it seems, the design has not been understood
and now, the code becomes more and more unmaintainable
bloat. I mean, the code is going to do far too much.

- The MCE routines in Xen are only for error data *collection*.
  Just pass it to Dom0 and that''s it.
  Dom0 will do the error analysis and figure out what do to.
  It is the Dom0 which will do a hypercall to do things like
  page-offlining or cpu offlining or whatever is needed.
  Your code tries to move everyting back from Dom0 into the
  hypervisor. I remember Keir having rejected my MCE patches
  because he feared this bloat.

- Dom0 VIRQ is for correctable errors only. Uncorrectable errors
  are delivered via MCE trap. Dom0 and DomU register a handle
  via set_trap_table hypercall. A non-registrated handler means,
  the guest can''t handle it by itself. Dom0 is always notified,
  the guest becomes only notified
  This seperation is completely ignored and misuse Dom0 VIRQ for everything
  (therefore the bunch of superflous flags (see next point))

- MCA flags: what are the differences between correctable 
  and recoverable ? what are the differences between uncorrectable,
  polled, reset and cmci and mce types ?

- You use dynamic memory allocation (which uses spinlocks) in MCE code
  and you roll your own mce handling instead of using the generic API in mce.c
  I suppose, you don''t understand it at all.

- I attach the design document again, since I have the impression, noone
  at Intel read it, hence the misunderstandings.

I think, it is best to get Gavin''s generic mce improvements upstream
first.


On Monday 16 February 2009 06:35:14 Ke, Liping wrote:> Hi, all
> These patches are for MCA enabling in XEN. It is sent as RFC firstly to
> collect some feedbacks for refinement if needed before the final patch. We
> also attach one description txt documents for your reference.
>
> Some implementation notes:
> 1) When error happens, if the error is fatal (pcc = 1) or can''t be
> recovered (pcc = 0, yet no good recovery methods), for avoiding losing logs
> in DOM0, we will reset machine immediately. Most of MCA MSRs are sticky.
> After reboot, MCA polling mechanism will send vIRQ to DOM0 for logging.
> 2) When MCE# happens, all CPUs enter MCA context. The first CPU who
> read&clear the error MSR bank will be this MCE# owner. Necessary
> locks/synchronization will help to judge the owner and select most severe
> error. 3) For convenience, we will select the most offending CPU to do most
> of processing&recovery job. 4) MCE# happens, we will do three jobs:
>     a. Send vIRQ to DOM0 for logging
>     b. Send vMCE# to Impacted Guest (Currently Only inject to impacted
> DOM0) c. Guest vMCE MSR virtualization
> 5) Some further improvement/adds might be done if needed:
>     a) Impacted DOM judgement algorithm.
>     b) Now vMCE# injection is controlled by centralized data(vmce_data).
> The injection algorithm is a bit complex. We might change the algorithm
> which''s based on PER_DOM data if you preferred. Notes for
understanding:
>         1) If several banks impact one domain, yet those banks belong to
> the same pCPU, it will be injected only once. 2) If more than one bank
> impact one domain, yet error banks belong to different pCPU, ith will be
> injected nr_num(pCPU) times. 3) We use centralized data [two arrays
> impact_domid, impact_cpus map in vmce_data] to represent the injection
> algorithm. Combined the two array item (idx, impact_domid) and (idx,
> impact_cpus) into one item (idx, impact_domid, impact_cpus). This item
> records the impact_domain id and the error pCPU map (Finding UC errors on
> this CPU which impact this domain). Then, we can judge how to inject the
> vMCE (domid, impact_times[nr_pCPUs]).
>         4) Although data structure is ready, we only inject vMCE# to DOMD0
> currently. c) Connection with recovery actions (cpu/memory online/offline)
> d) More refines and tests for HVM might be done when needed.
>
> Patch Description:
> 1. basic_mca_support: Enable MCA support in XEN.
> 2. vmsr_virtualization: Guest MCE# MSR read/write virtualization support in
> XEN. 3. mce_dom0: Cooperating with XEN, DOM0 add vIRQ and vMCE# handler.
> Translate XEN log to DOM0, re-use Linux kernel and MCELOG mechanisms and
> MCE handler. This is mainly a demonstration patch.
>
> About Test:
> We did some internal test and the result is just fine.
>
> Any feedback is welcome and thanks a lot for your help! :-)
> Regards,
> Criping


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Feb-16 14:18 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

I realize from this and earlier MCE patches from Intel,
that Intel tries to change the machine check design
on its ground.

The basic ideas behind current design:

1. Xen collects error telemetry
2. Xen delivers correctable errors to Dom0 via VIRQ
3. Xen delivers uncorrectable errors to Dom0 via trap handler
4. Xen delivers uncorrectable errors to DomU only if Dom0 tells Xen to do so
5. Xen performs health measurements as told by Dom0 via hypercalls
   such as cpu- or page-offlining
6. Dom0 performs error analysis, figures out what is going on,
    calls hypercalls for the right health measurement


The basic ideas behind Intel''s new design (as far as I can see them
from their
patches I have seen so far):

1. Xen collects error telemetry
2. Xen performs error analysis, figures out what is going on
3. Xen automatically does health measurements automatically
    like cpu- and page-offlining
4. Xen delivers error telemetry to Dom0 via VIRQ for error logging only
    independent of the error type
5. Inject MCEs into the guest directly
6. Don''t use the MCE trap handler at all


IMO, any design change should be discussed first and not changed
silently, since this will confuse everyone and noone will know 
what is the right thing to do in Xen and in Dom0 and this
in turn will lead to error prone, unmaintainable code in both
Xen and in Dom0

Christoph


On Monday 16 February 2009 14:34:36 Christoph Egger
wrote:> To me, it seems, the design has not been understood
> and now, the code becomes more and more unmaintainable
> bloat. I mean, the code is going to do far too much.
>
> - The MCE routines in Xen are only for error data *collection*.
>   Just pass it to Dom0 and that''s it.
>   Dom0 will do the error analysis and figure out what do to.
>   It is the Dom0 which will do a hypercall to do things like
>   page-offlining or cpu offlining or whatever is needed.
>   Your code tries to move everyting back from Dom0 into the
>   hypervisor. I remember Keir having rejected my MCE patches
>   because he feared this bloat.
>
> - Dom0 VIRQ is for correctable errors only. Uncorrectable errors
>   are delivered via MCE trap. Dom0 and DomU register a handle
>   via set_trap_table hypercall. A non-registrated handler means,
>   the guest can''t handle it by itself. Dom0 is always notified,
>   the guest becomes only notified
>   This seperation is completely ignored and misuse Dom0 VIRQ for everything
>   (therefore the bunch of superflous flags (see next point))
>
> - MCA flags: what are the differences between correctable
>   and recoverable ? what are the differences between uncorrectable,
>   polled, reset and cmci and mce types ?
>
> - You use dynamic memory allocation (which uses spinlocks) in MCE code
>   and you roll your own mce handling instead of using the generic API in
> mce.c I suppose, you don''t understand it at all.
>
> - I attach the design document again, since I have the impression, noone
>   at Intel read it, hence the misunderstandings.
>
> I think, it is best to get Gavin''s generic mce improvements
upstream first.
>
> On Monday 16 February 2009 06:35:14 Ke, Liping wrote:
> > Hi, all
> > These patches are for MCA enabling in XEN. It is sent as RFC firstly
to
> > collect some feedbacks for refinement if needed before the final
patch.
> > We also attach one description txt documents for your reference.
> >
> > Some implementation notes:
> > 1) When error happens, if the error is fatal (pcc = 1) or
can''t be
> > recovered (pcc = 0, yet no good recovery methods), for avoiding losing
> > logs in DOM0, we will reset machine immediately. Most of MCA MSRs are
> > sticky. After reboot, MCA polling mechanism will send vIRQ to DOM0 for
> > logging. 2) When MCE# happens, all CPUs enter MCA context. The first
CPU
> > who read&clear the error MSR bank will be this MCE# owner.
Necessary
> > locks/synchronization will help to judge the owner and select most
severe
> > error. 3) For convenience, we will select the most offending CPU to do
> > most of processing&recovery job. 4) MCE# happens, we will do three
jobs:
> > a. Send vIRQ to DOM0 for logging
> >     b. Send vMCE# to Impacted Guest (Currently Only inject to impacted
> > DOM0) c. Guest vMCE MSR virtualization
> > 5) Some further improvement/adds might be done if needed:
> >     a) Impacted DOM judgement algorithm.
> >     b) Now vMCE# injection is controlled by centralized
data(vmce_data).
> > The injection algorithm is a bit complex. We might change the
algorithm
> > which''s based on PER_DOM data if you preferred. Notes for
understanding:
> >         1) If several banks impact one domain, yet those banks belong
to
> > the same pCPU, it will be injected only once. 2) If more than one bank
> > impact one domain, yet error banks belong to different pCPU, ith will
be
> > injected nr_num(pCPU) times. 3) We use centralized data [two arrays
> > impact_domid, impact_cpus map in vmce_data] to represent the injection
> > algorithm. Combined the two array item (idx, impact_domid) and (idx,
> > impact_cpus) into one item (idx, impact_domid, impact_cpus). This item
> > records the impact_domain id and the error pCPU map (Finding UC errors
on
> > this CPU which impact this domain). Then, we can judge how to inject
the
> > vMCE (domid, impact_times[nr_pCPUs]).
> >         4) Although data structure is ready, we only inject vMCE# to
> > DOMD0 currently. c) Connection with recovery actions (cpu/memory
> > online/offline) d) More refines and tests for HVM might be done when
> > needed.
> >
> > Patch Description:
> > 1. basic_mca_support: Enable MCA support in XEN.
> > 2. vmsr_virtualization: Guest MCE# MSR read/write virtualization
support
> > in XEN. 3. mce_dom0: Cooperating with XEN, DOM0 add vIRQ and vMCE#
> > handler. Translate XEN log to DOM0, re-use Linux kernel and MCELOG
> > mechanisms and MCE handler. This is mainly a demonstration patch.
> >
> > About Test:
> > We did some internal test and the result is just fine.
> >
> > Any feedback is welcome and thanks a lot for your help! :-)
> > Regards,
> > Criping


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Feb-16 15:03 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On 16/02/2009 14:18, "Christoph Egger" <Christoph.Egger@amd.com>
wrote:
> IMO, any design change should be discussed first and not changed
> silently, since this will confuse everyone and noone will know
> what is the right thing to do in Xen and in Dom0 and this
> in turn will lead to error prone, unmaintainable code in both
> Xen and in Dom0
I certainly think we should have a shared approach for x86 machine-check
handling, rather than completely different architectures for AMD and Intel.
Fortunately Sun are an interested and active third party regarding this
feature. I''ll be interested in their opinion.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-16 15:05 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Aha, Christoph, sorry for the suprise to you, but I think we have descript our
suggestion to you (refert to http://markmail.org/message/vpcdojylxkrg6uz3
please). As I didn't get any response from your side, so I suppose you are
waiting for the patch to get more idea, that's the reason Criping and I
hurry up to cook the patch and send it out as RFC. The RFC means it is target
for comments, as we know MCA handling is complex and need community discussion
(I have to say sometimes patch is more clear than design doc, although cooking a
patch need more effort).

Your description of our design is quite clear, that also means our RFC has
achieved it's purpose :-) One exception is item 6, MCE trap handler in HV
side is still needed for PV domain just as it is now (the bounce buffer, the
trap priority etc), but for guest, yes, we try to re-use guest's MCA
handler.

As said already, MCE handling is complex, so can we discuss it in details on how
to handle the MCA and get some consensus ? We have CC'ed all engineers we
think may be interesting on it.

I merge comments to your another mail as below:
>- The MCE routines in Xen are only for error data *collection*.
>  Just pass it to Dom0 and that's it.
>  Dom0 will do the error analysis and figure out what do to.
>  It is the Dom0 which will do a hypercall to do things like
>  page-offlining or cpu offlining or whatever is needed.
>  Your code tries to move everyting back from Dom0 into the
>  hypervisor. I remember Keir having rejected my MCE patches
>  because he feared this bloat.
Sorry that I didn't notice Keir's feedback to your original patch, I
will google it, or it will be great if you can share me when that happen?
>- MCA flags: what are the differences between correctable 
>  and recoverable ? what are the differences between uncorrectable,
>  polled, reset and cmci and mce types ?
Per my understanding, correctable error (sometimes it is called corrected error)
means hardware have recovered the error and software is not impacted (although
some proactive action is prefered), while recoverable means hardware does not
recover the error but it is possible that softeare can recover the error (it is
sometihng like non-fatal error in PCI-E spec, although not exactly same, I
think).
>
>- You use dynamic memory allocation (which uses spinlocks) in MCE code
>  and you roll your own mce handling instead of using the 
>generic API in mce.c
I think that is in softIRQ context and should be ok for spinlocks.
>  I suppose, you don't understand it at all.
>
>- I attach the design document again, since I have the 
>impression, noone
>  at Intel read it, hence the misunderstandings.
I promise we read it carefully, otherwise my manager is sure to challenge me
firstly before you, and it is really good written.
>
>I think, it is best to get Gavin's generic mce improvements 
>upstream first.Sure, Gavin's improvement is important. Again, this patch is just a RFC, and
some components is still WIP like inject per-domain MCA since we want to get
input firstly.

Thanks
Yunhong Jiang
>-----Original Message-----
>From: Christoph Egger [mailto:Christoph.Egger@amd.com] 
>Sent: 2009年2月16日 22:18
>To: xen-devel@lists.xensource.com
>Cc: Ke, Liping; Frank.Vanderlinden@Sun.COM; Jiang, Yunhong; 
>Keir Fraser; Gavin Maltby
>Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>
>
>I realize from this and earlier MCE patches from Intel,
>that Intel tries to change the machine check design
>on its ground.
>
>The basic ideas behind current design:
>
>1. Xen collects error telemetry
>2. Xen delivers correctable errors to Dom0 via VIRQ
>3. Xen delivers uncorrectable errors to Dom0 via trap handler
>4. Xen delivers uncorrectable errors to DomU only if Dom0 
>tells Xen to do so
>5. Xen performs health measurements as told by Dom0 via hypercalls
>   such as cpu- or page-offlining
>6. Dom0 performs error analysis, figures out what is going on,
>    calls hypercalls for the right health measurement
>
>
>The basic ideas behind Intel's new design (as far as I can see 
>them from their 
>patches I have seen so far):
>
>1. Xen collects error telemetry
>2. Xen performs error analysis, figures out what is going on
>3. Xen automatically does health measurements automatically
>    like cpu- and page-offlining
>4. Xen delivers error telemetry to Dom0 via VIRQ for error logging only
>    independent of the error type
>5. Inject MCEs into the guest directly
>6. Don't use the MCE trap handler at all
>
>
>IMO, any design change should be discussed first and not changed
>silently, since this will confuse everyone and noone will know 
>what is the right thing to do in Xen and in Dom0 and this
>in turn will lead to error prone, unmaintainable code in both
>Xen and in Dom0
>
>Christoph
>
>
>On Monday 16 February 2009 14:34:36 Christoph Egger wrote:
>> To me, it seems, the design has not been understood
>> and now, the code becomes more and more unmaintainable
>> bloat. I mean, the code is going to do far too much.
>>
>> - The MCE routines in Xen are only for error data *collection*.
>>   Just pass it to Dom0 and that's it.
>>   Dom0 will do the error analysis and figure out what do to.
>>   It is the Dom0 which will do a hypercall to do things like
>>   page-offlining or cpu offlining or whatever is needed.
>>   Your code tries to move everyting back from Dom0 into the
>>   hypervisor. I remember Keir having rejected my MCE patches
>>   because he feared this bloat.
>>
>> - Dom0 VIRQ is for correctable errors only. Uncorrectable errors
>>   are delivered via MCE trap. Dom0 and DomU register a handle
>>   via set_trap_table hypercall. A non-registrated handler means,
>>   the guest can't handle it by itself. Dom0 is always notified,
>>   the guest becomes only notified
>>   This seperation is completely ignored and misuse Dom0 VIRQ 
>for everything
>>   (therefore the bunch of superflous flags (see next point))
>>
>> - MCA flags: what are the differences between correctable
>>   and recoverable ? what are the differences between uncorrectable,
>>   polled, reset and cmci and mce types ?
>>
>> - You use dynamic memory allocation (which uses spinlocks) 
>in MCE code
>>   and you roll your own mce handling instead of using the 
>generic API in
>> mce.c I suppose, you don't understand it at all.
>>
>> - I attach the design document again, since I have the 
>impression, noone
>>   at Intel read it, hence the misunderstandings.
>>
>> I think, it is best to get Gavin's generic mce improvements 
>upstream first.
>>
>> On Monday 16 February 2009 06:35:14 Ke, Liping wrote:
>> > Hi, all
>> > These patches are for MCA enabling in XEN. It is sent as 
>RFC firstly to
>> > collect some feedbacks for refinement if needed before the 
>final patch.
>> > We also attach one description txt documents for your reference.
>> >
>> > Some implementation notes:
>> > 1) When error happens, if the error is fatal (pcc = 1) or
can't be
>> > recovered (pcc = 0, yet no good recovery methods), for 
>avoiding losing
>> > logs in DOM0, we will reset machine immediately. Most of 
>MCA MSRs are
>> > sticky. After reboot, MCA polling mechanism will send vIRQ 
>to DOM0 for
>> > logging. 2) When MCE# happens, all CPUs enter MCA context. 
>The first CPU
>> > who read&clear the error MSR bank will be this MCE# owner. 
>Necessary
>> > locks/synchronization will help to judge the owner and 
>select most severe
>> > error. 3) For convenience, we will select the most 
>offending CPU to do
>> > most of processing&recovery job. 4) MCE# happens, we will 
>do three jobs:
>> > a. Send vIRQ to DOM0 for logging
>> >     b. Send vMCE# to Impacted Guest (Currently Only inject 
>to impacted
>> > DOM0) c. Guest vMCE MSR virtualization
>> > 5) Some further improvement/adds might be done if needed:
>> >     a) Impacted DOM judgement algorithm.
>> >     b) Now vMCE# injection is controlled by centralized 
>data(vmce_data).
>> > The injection algorithm is a bit complex. We might change 
>the algorithm
>> > which's based on PER_DOM data if you preferred. Notes for 
>understanding:
>> >         1) If several banks impact one domain, yet those 
>banks belong to
>> > the same pCPU, it will be injected only once. 2) If more 
>than one bank
>> > impact one domain, yet error banks belong to different 
>pCPU, ith will be
>> > injected nr_num(pCPU) times. 3) We use centralized data [two
arrays
>> > impact_domid, impact_cpus map in vmce_data] to represent 
>the injection
>> > algorithm. Combined the two array item (idx, impact_domid) 
>and (idx,
>> > impact_cpus) into one item (idx, impact_domid, 
>impact_cpus). This item
>> > records the impact_domain id and the error pCPU map 
>(Finding UC errors on
>> > this CPU which impact this domain). Then, we can judge how 
>to inject the
>> > vMCE (domid, impact_times[nr_pCPUs]).
>> >         4) Although data structure is ready, we only 
>inject vMCE# to
>> > DOMD0 currently. c) Connection with recovery actions (cpu/memory
>> > online/offline) d) More refines and tests for HVM might be 
>done when
>> > needed.
>> >
>> > Patch Description:
>> > 1. basic_mca_support: Enable MCA support in XEN.
>> > 2. vmsr_virtualization: Guest MCE# MSR read/write 
>virtualization support
>> > in XEN. 3. mce_dom0: Cooperating with XEN, DOM0 add vIRQ and vMCE#
>> > handler. Translate XEN log to DOM0, re-use Linux kernel and MCELOG
>> > mechanisms and MCE handler. This is mainly a demonstration patch.
>> >
>> > About Test:
>> > We did some internal test and the result is just fine.
>> >
>> > Any feedback is welcome and thanks a lot for your help! :-)
>> > Regards,
>> > Criping
>
>
>
>-- 
>---to satisfy European Law for business letters:
>Advanced Micro Devices GmbH
>Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
>Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
>Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
>Registergericht Muenchen, HRB Nr. 43632
>
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-16 15:19 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] 
>Sent: 2009年2月16日 23:03
>To: Christoph Egger; xen-devel@lists.xensource.com
>Cc: Ke, Liping; Frank.Vanderlinden@Sun.COM; Jiang, Yunhong; 
>Gavin Maltby
>Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>
>On 16/02/2009 14:18, "Christoph Egger"
<Christoph.Egger@amd.com> wrote:
>
>> IMO, any design change should be discussed first and not changed
>> silently, since this will confuse everyone and noone will know
>> what is the right thing to do in Xen and in Dom0 and this
>> in turn will lead to error prone, unmaintainable code in both
>> Xen and in Dom0
>
>I certainly think we should have a shared approach for x86 
>machine-check
>handling, rather than completely different architectures for 
>AMD and Intel.
>Fortunately Sun are an interested and active third party regarding this
>feature. I'll be interested in their opinion.
Yes, we don;t want difference here, we change only mce-intel.c because this is
just for discuss.
And I remember SUZUKI Kazuhiro are also interesting on this topic (is CC'ed
now).

Thanks
-- Yunhong Jiang
 >
> -- Keir
>
>
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Frank Van Der Linden

2009-Feb-16 17:58 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Keir Fraser wrote:> On 16/02/2009 14:18, "Christoph Egger"
<Christoph.Egger@amd.com> wrote:
>
>   
>> IMO, any design change should be discussed first and not changed
>> silently, since this will confuse everyone and noone will know
>> what is the right thing to do in Xen and in Dom0 and this
>> in turn will lead to error prone, unmaintainable code in both
>> Xen and in Dom0
>>     
>
> I certainly think we should have a shared approach for x86 machine-check
> handling, rather than completely different architectures for AMD and Intel.
> Fortunately Sun are an interested and active third party regarding this
> feature. I''ll be interested in their opinion.
>
>  -- Keir
>
>
>   Today is a holiday here in the US, so I have only taken a superficial 
look at the patches.

However, my initial impression is that I share Christoph''s concern. I 
like the original design, where the hypervisor deals with low-level 
information collection, passes it on to dom0, which then can make a 
high-level decision and instructs the hypervisor to take high-level 
action via a hypercall. The hypervisor does the actual MSR reads and 
writes, dom0 only acts on the values provided via hypercalls.

We added the physcpuinfo hypercall to stay in this framework: get 
physical information needed for analysis, but don''t access any
registers
directly.

It seems that these new patches blur this distinction, especially the 
virtualized msr reads/writes. I am not sure what added value they have, 
except for being able to run an unmodified MCA handler. However, I think 
that any active MCA decision making should be centralized, and that 
centralized place would be dom0. Dom0 is already very much aware of the 
hypervisor, so I don''t see the advantage of having an unmodified MCA 
handler there (our MCA handlers are virtually unmodified, it''s just
that
the part where the telemetry is collected is inside Xen for the dom0 case).

I also agree that different behavior for AMD and Intel chips would not 
be good.

Perhaps the Intel folks can explain what the advantages of their 
approach are, and give some scenarios where there approach would be 
better? My first impression is that staying within the general framework 
as provided by Christoph''s original work is the better option.

- Frank

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Frank Van Der Linden

2009-Feb-17 05:50 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

I should probably clarify myself, since I may have created one wrong 
impression: I don''t object to the parts of the Intel code where the 
hypervisor does more of the initial work (like is also done in the page 
offline code); it can be critical that this work is done quickly, and 
the hypervisor is the only place that has both the information and the 
means to do it.

So, doing some more work there in some cases is probably the best thing 
to do, even though there is natural resistance to adding more code to 
the hypervisor.

The main thing that I don''t quite understand the benefits of is the
vMCE
code, which is why I asked if there are examples of where that approach 
would work better.

- Frank


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-17 06:41 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

I think the major difference including: a) How to handle the #MC, i.e. reset
system, decide impacted components, take recover action like page offline etc.
b) How to handle error impact guest. As to other item like log/telemetry, I
think our implementation didn''t have much different to current
implementation.

For how the handle the #MC, we think keep #MC handling in the hypervisor handler
will have following benifit:
a) When there is #MC happen, we need take action to reduce the severity of the
error as soon as possible. After all, #MC is something different to normal
interrupt.
b) Even if Dom0 will take central action, most of the work will be to invoke
hypercall to Xen HV to take action still.
c) Currently all #MC will first go-through Dom0 before inject to DomU, but we
didn''t think much benifit for such path, since HV knows about guest
quite well.

Above is the main reason that we keep #MC handling in Xen HV.

As how to handle error impact guest, I tried to describe 3 options in
http://lists.xensource.com/archives/html/xen-devel/2008-12/msg00643.html,
basically we have 3 options (you can refer to above URL for more information):
1) A PV #MC handler is implemented in guest. This PV handler gets MCA
information from Xen HV through hypercall, it is what currently implemented.;
 2) Xen will provide MCA MSR virtualization so that guest''s native #MC
handler can run without changes;
 3)uses a PV #MC handler for guest as option 1, but interface between Xen/guest
is abstract event, like offline offending page, terminate current execution
context etc.

We select option 2 in our current implementation, with following consideration: 
1) With this method, we can re-use native MCE handler , which may be tested more
widely
2) We can benifit from native MCE handler''s improvement
3) it can support HVM guest better, especially this method can provide support
to HVM/PV guest at the same time.
 4) We don''t need maintain PV handler anymore, for various guest type.

One dis-advantage for this option is, guest (dom0) missed the physical CPU
information.

We think it will be much better if we can define a clear abstract interface
between Xen/guest, i.e. option 3, but even in that situation, current
implementation can be the last resorted method if guest has no PV abstract event
handler installed.

Especially we apply this method to Dom0 , because after we place all #MC
handling in Xen HV, dom0''s MCE handler is same to normal guest, and we
don''t need to diffrenciate it anymore, you can see the changes to dom0
for MCA is very small now. BTW, one assumption here is, dom0''s
log/telemetry will all go-through the VIRQ handler while Dom0''s #MC is
just for it''s recovery.

Of course, currently keep system running is far more important than guest #MC,
and we can simply kill impacted guest. We implement the virtual MSR read/write
mainly for Dom0 support (or maybe even dom0 can be killed now since it
can''t do much recovery still ).

Thanks
Yunhong Jiang
>>   
>Today is a holiday here in the US, so I have only taken a superficial 
>look at the patches.
>
>However, my initial impression is that I share Christoph''s concern.
I
>like the original design, where the hypervisor deals with low-level 
>information collection, passes it on to dom0, which then can make a 
>high-level decision and instructs the hypervisor to take high-level 
>action via a hypercall. The hypervisor does the actual MSR reads and 
>writes, dom0 only acts on the values provided via hypercalls.
>
>We added the physcpuinfo hypercall to stay in this framework: get 
>physical information needed for analysis, but don''t access any 
>registers 
>directly.
>
>It seems that these new patches blur this distinction, especially the 
>virtualized msr reads/writes. I am not sure what added value 
>they have, 
>except for being able to run an unmodified MCA handler. 
>However, I think 
>that any active MCA decision making should be centralized, and that 
>centralized place would be dom0. Dom0 is already very much 
>aware of the 
>hypervisor, so I don''t see the advantage of having an unmodified
MCA
>handler there (our MCA handlers are virtually unmodified, it''s 
>just that 
>the part where the telemetry is collected is inside Xen for 
>the dom0 case).
>
>I also agree that different behavior for AMD and Intel chips would not 
>be good.
>
>Perhaps the Intel folks can explain what the advantages of their 
>approach are, and give some scenarios where there approach would be 
>better? My first impression is that staying within the general 
>framework 
>as provided by Christoph''s original work is the better option.
>
>- Frank
>
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-17 06:44 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

>-----Original Message-----
>From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM] 
>Sent: 2009年2月17日 13:50
>To: Keir Fraser
>Cc: Gavin Maltby; Christoph Egger; 
>xen-devel@lists.xensource.com; Jiang, Yunhong; Ke, Liping
>Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>
>I should probably clarify myself, since I may have created one wrong 
>impression: I don't object to the parts of the Intel code where the 
>hypervisor does more of the initial work (like is also done in 
>the page 
>offline code); it can be critical that this work is done quickly, and 
>the hypervisor is the only place that has both the information and the 
>means to do it.
Yes, agree.
>
>So, doing some more work there in some cases is probably the 
>best thing 
>to do, even though there is natural resistance to adding more code to 
>the hypervisor.
We all agree to keep HV less code, and we will try to reduce the LOC in next
round patch.
>
>The main thing that I don't quite understand the benefits of 
>is the vMCE 
>code, which is why I asked if there are examples of where that 
>approach 
>would work better.
Please see my mail I just sent out, you can also refer to
http://lists.xensource.com/archives/html/xen-devel/2008-12/msg00643.html.

Thanks
Yunhong Jiang
>
>- Frank
>
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-17 06:53 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

>>So, doing some more work there in some cases is probably the 
>>best thing 
>>to do, even though there is natural resistance to adding more code to 
>>the hypervisor.
>
>We all agree to keep HV less code, and we will try to reduce 
>the LOC in next round patch.
BTW, some changes in Xen HV is needed no matter we place #MC to Xen or dom0,
like ownership CPU check, select most severity CPU, post handler in softIRQ etc,
which is complex also.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Feb-18 18:05 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Tuesday 17 February 2009 07:41:29 Jiang, Yunhong
wrote:> I think the major difference including: a) How to handle the #MC, i.e.
> reset system, decide impacted components, take recover action like page
> offline etc. b) How to handle error impact guest. As to other item like
> log/telemetry, I think our implementation didn''t have much
different to
> current implementation.
The hardware doesn''t know what recover actions the software can do.
If page A is faulty, and software maintains a copy in page B, then
software can turn an uncorrectable error into an correctable one.
If the hardware is aware of that copy (memory mirroring done by memory
controller), then the hardware itself turns the uncorrectable error
into an correctable one and reports an correctable error.

Therefore, I don''t see why other flags than correctable and
uncorrectable
are needed at all.

After some thinking on taking some quick actions, I can
agree on it if it meets the condition below. Be aware, error analyzes
is highly CPU vendor and even CPU family/model specific. Doing a
complete analyzes as Solaris does blows Xen up a *lot*.

Therefore, a *cheap* error analysis must be enough to figure out
if recover actions like page-offlining or cpu offlining
are *obviously* only the right thing to do.

If this is not the case, then let Dom0 decide what to do.

Christoph

-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-19 09:13 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

xen-devel-bounces@lists.xensource.com <> wrote:> On Tuesday 17 February 2009 07:41:29 Jiang, Yunhong wrote:
>> I think the major difference including: a) How to handle the #MC, i.e.
>> reset system, decide impacted components, take recover action like page
>> offline etc. b) How to handle error impact guest. As to other item like
>> log/telemetry, I think our implementation didn''t have much
different to
>> current implementation.
> 
> The hardware doesn''t know what recover actions the software can
do.
> If page A is faulty, and software maintains a copy in page B, then
> software can turn an uncorrectable error into an correctable one.
> If the hardware is aware of that copy (memory mirroring done by memory
> controller), then the hardware itself turns the uncorrectable error
> into an correctable one and reports an correctable error.
> 
> Therefore, I don''t see why other flags than correctable and
uncorrectable
> are needed at all.
Christoph, thanks for your reply.

I think recoverable means VMM/OS can take recover action like page offline,
while unrecoverable means VMM/OS can''t do anything and we have to
reboot. The main reason we need these flag is, several step is required for MCA
handling, for example, when multipel MCE happen to multiple CPU, firstly each
CPU check it''s own severity, seconldy we need check the most severity
CPU and take action. For example, CPU A may get unrecoverable  while CPU B  get
recoverable, they will check the information and the result, and the final
solution will be unrecoverable .
> 
> 
> After some thinking on taking some quick actions, I can
> agree on it if it meets the condition below. Be aware, error analyzes
> is highly CPU vendor and even CPU family/model specific. Doing a
> complete analyzes as Solaris does blows Xen up a *lot*.
I didn''t check Solaris code, so can Gavin or Frank gives us more
information? At least currently it will not be large AFAIK, and if we do need
model specific support (I don''t know such requirement now, and I
suppose it will not be common if exists, please correct me if wrong), dom0 can
inform Xen for it.
 > 
> Therefore, a *cheap* error analysis must be enough to figure out
> if recover actions like page-offlining or cpu offlining
> are *obviously* only the right thing to do.
Currently we only plan to support these two types, do you have plan for other
recover action? And is that action be done better in Dom0 than in Xen?

Thanks
-- Yunhong Jiang
> 
> If this is not the case, then let Dom0 decide what to do.
> 
> Christoph
> 
> 
> --
> ---to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Feb-19 16:25 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Thursday 19 February 2009 10:13:18 Jiang, Yunhong
wrote:> xen-devel-bounces@lists.xensource.com <> wrote:
> > On Tuesday 17 February 2009 07:41:29 Jiang, Yunhong wrote:
> >> I think the major difference including: a) How to handle the #MC,
i.e.
> >> reset system, decide impacted components, take recover action like
page
> >> offline etc. b) How to handle error impact guest. As to other item
like
> >> log/telemetry, I think our implementation didn''t have
much different to
> >> current implementation.
> >
> > The hardware doesn''t know what recover actions the software
can do.
> > If page A is faulty, and software maintains a copy in page B, then
> > software can turn an uncorrectable error into an correctable one.
> > If the hardware is aware of that copy (memory mirroring done by memory
> > controller), then the hardware itself turns the uncorrectable error
> > into an correctable one and reports an correctable error.
> >
> > Therefore, I don''t see why other flags than correctable and
uncorrectable
> > are needed at all.
>
> Christoph, thanks for your reply.
>
> I think recoverable means VMM/OS can take recover action like page offline,
> while unrecoverable means VMM/OS can''t do anything and we have to
reboot.
Ok, here is a different interpretation of what is correctable and 
uncorrectable.
Uncorrectable in your interpretation means neither hardware nor software
can''t
do anything.
Uncorrectable in my interpretation means the hardware can''t correct it,
but
software may have more information and correct it.
> The main reason we need these flag is, several step is required for MCA
> handling, for example, when multiple MCE happen to multiple CPU, firstly
> each CPU check it''s own severity, seconldy we need check the most
severity
> CPU and take action. For example, CPU A may get unrecoverable  while CPU B 
> get recoverable, they will check the information and the result, and the
> final solution will be unrecoverable .
I brought up an example of a broken memory page for my argumentation,
you bring up a broken CPU for your argumentation.

We need to find a common denominator to compare.

If a CPU is completely broken and you are on UP, then game is over.
Not even a reboot can help.
On a SMP system, offline the CPU and inform Dom0.
> > After some thinking on taking some quick actions, I can
> > agree on it if it meets the condition below. Be aware, error analyzes
> > is highly CPU vendor and even CPU family/model specific. Doing a
> > complete analyzes as Solaris does blows Xen up a *lot*.
>
> I didn''t check Solaris code, so can Gavin or Frank gives us more
> information? At least currently it will not be large AFAIK, and if we do
> need model specific support (I don''t know such requirement now,
and I
> suppose it will not be common if exists, please correct me if wrong), dom0
> can inform Xen for it.
>
> > Therefore, a *cheap* error analysis must be enough to figure out
> > if recover actions like page-offlining or cpu offlining
> > are *obviously* only the right thing to do.
>
> Currently we only plan to support these two types, do you have plan for
> other recover action? And is that action be done better in Dom0 than in
> Xen?
Yes!! Solaris maintains a list of broken pages which is even persistent
across reboot when the serial number of the DIMM didn''t change.
For doing page offlining properly, SUN should design a hypercall allowing
the Dom0 to give Xen this list as early as possible at boot time.

Further, with our Shanghai CPU, we can disable certain parts of its L3 cache.
Instead of offlining that broken CPU completely, just disable the broken
part of it. The registers for this is in PCI config space. Since Xen delegates
PCI access to Dom0, Dom0 can do that.

Christoph

-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-20 02:53 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Christoph Egger <mailto:Christoph.Egger@amd.com>
wrote:> Ok, here is a different interpretation of what is correctable and
> uncorrectable. Uncorrectable in your interpretation means neither hardware
> nor software can''t
> do anything.
> Uncorrectable in my interpretation means the hardware can''t
> correct it, but
> software may have more information and correct it.
Yes. Maybe "fatal" is more appropriate name here. 
> 
>> The main reason we need these flag is, several step is required for MCA
>> handling, for example, when multiple MCE happen to multiple CPU,
firstly
>> each CPU check it''s own severity, seconldy we need check the
most severity
>> CPU and take action. For example, CPU A may get unrecoverable  while
CPU B
>> get recoverable, they will check the information and the result, and
the
>> final solution will be unrecoverable .
> 
> I brought up an example of a broken memory page for my argumentation,
> you bring up a broken CPU for your argumentation.
> 
> We need to find a common denominator to compare.
> 
> If a CPU is completely broken and you are on UP, then game is over. Not
> even a reboot can help. On a SMP system, offline the CPU and inform Dom0.
Sorry I didn''t get relationship between the flags and comparing the two
example :$
>> Currently we only plan to support these two types, do you have plan for
>> other recover action? And is that action be done better in Dom0 than in
>> Xen?
> 
> Yes!! Solaris maintains a list of broken pages which is even persistent
> across reboot when the serial number of the DIMM didn''t change.
> For doing page offlining properly, SUN should design a
> hypercall allowing
> the Dom0 to give Xen this list as early as possible at boot time.
We have a patch to support  page offline (sent as RFC to mailing list), and it
already export a hypercall for Dom0 to ask Xen to offline pages (this is for
proactive action to CE errors from Dom0), also, as Frank suggested, we will add
a hypercall for Dom0 to get page''s offline status, so it should be OK.
> Further, with our Shanghai CPU, we can disable certain parts
> of its L3 cache.
> Instead of offlining that broken CPU completely, just disable
> the broken
> part of it. The registers for this is in PCI config space.
> Since Xen delegates
> PCI access to Dom0, Dom0 can do that.
Sorry that I have no idea of Shanghai, but I''m a bit suprised that when
error happens to cache, we will transfer control to Dom0  and wait for
Dom0''s MCA handler to take action to disable the cache, it is really a
loooong code path. Per my understanding, if there are issue in cache, we should
clear/disable the cache ASAP to avoid more server result, and it is a extreme
example to let Xen handle the MCA. Or maybe I missed something important in this
feature?

BTW, I want to clarify that this patch is for #MC handling (i.e. the
"uncorrected" error in your mind). For hardware correctable error
(i.e. "correctable") , Xen will do nothing, but just pass it to Dom0
as vIRQ as our previous patch
(http://lists.xensource.com/archives/html/xen-devel/2008-12/msg00970.html )
shown, because CE will not impact system. So if the "cache index
disable" is to disable part of cache after too many CE (Correctable Error)
as proactive action, I think we are on the same page.

I attached two foil that are part of our Xen summit presentation. Page 1 is
mainly for #MC handling, page2 is for CE handling (though CMCI or polling). The
page 1 is described in the patch clearly. Page 2 is what our previous patch did
.

Thanks
-- Yunhong Jiang
> 
> Christoph
> 
> --
> ---to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Frank van der Linden

2009-Feb-20 21:01 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

I had some time to look over the patches in more detail and the previous 
discussions that were referenced.

 From your patches, what you write, and your slides, I gather the following:

* Corrected errors (found through polling and CMCI):
   1) Collected error data (telemetry)
   2) Inform dom0 through the VIRQ.

* Uncorrected errors:
   1) See if any immediate action can be taken (CPU offline,
      page retire)
   2) Collect telemetry
   3) Deliver vMCE to dom0 (and possibly domU)


I think it''s fine that the hypervisor takes some immediate action in 
some cases. It is good to do this as quickly as possible, and only the 
hypervisor has all the information immediately available.

What would be needed for the Solaris framework, however, is to provide 
information on what action was taken, along with the telemetry. As 
Christoph noted, the Solaris FMA code checks, at bootup, if there were 
components that previously had errors, and if so, it disables them again 
to prevent further errors. To be able to do this, it needs the full 
information not just on the error data, but also on any action taken by 
the hypervisor, so that it can repeat this action. It may take some 
modifications in the FMA code to account for the case where an action 
has already been taken (to avoid trying to take conflicting action), but 
I think that shouldn''t be a big problem. Although I don''t know
that part
of our code very well.

The part that I still have doubts about, is the vMCE code. As far as I 
can tell, it takes the information out of the MCA banks, and stores it, 
per event, in a linked list. Per vMCE, the head of the list is taken and 
used as an MSR context. The rdmsr instruction is trapped and redirected 
to that information. It seems that the wrmsr instruction is accepted, 
but has no effect (except that if the trap handler writes a value and 
then reads it back again immediately, the values will be the same).

The main argument for the vMCE code seems to be that it allows existing 
MCA handlers to be reused. However, I don''t see the advantage in this. 
Basically, it allows the handler to retrieve the MCA banks through plain 
rdmsr instructions. Which is fine, but that''s as far as it goes.
Without
any additional information, that feature does not seem useful. wrmsr 
instructions has no effect.

To take further action, the MCA code in dom0 (or a domU) needs to know 
that it is running under Xen, and it needs to have detailed physical 
information on the system. In other words, the existing code that can be 
used is only the code that gathers some information. So, the only thing 
that vMCE is good for, is that you can run unmodified error logging 
code. But you can''t interpret any of the error information further 
without knowing more. Especially for a domU, which might not know 
anything, this doesn''t seem useful. What would the user of a domU do 
with that information?

To recap, I think the part where Xen itself takes action is fine, with 
some modifications. But I don''t see any advantages in vMCE delivery, 
unless I''m missing something of course..

- Frank

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-23 09:01 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM>
wrote:> I had some time to look over the patches in more detail and
> the previous
> discussions that were referenced.
> 
> From your patches, what you write, and your slides, I gather
> the following:
> 
> * Corrected errors (found through polling and CMCI):
>   1) Collected error data (telemetry)
>   2) Inform dom0 through the VIRQ.
> 
> * Uncorrected errors:
>   1) See if any immediate action can be taken (CPU offline,      page
>   retire) 2) Collect telemetry
>   3) Deliver vMCE to dom0 (and possibly domU)
One notice is, we delieve vMCE to dom0/domU only when it is impacted. The idea
behind this is, MCE is handled by Xen HV totally, while guest''s vMCE
handler will only works for itself. For example, when a page broken, Xen will
firstly mark the page offline in Xen side (i.e. take the recover action), then,
it will inject a vMCE to guest corresponding (dom0 or domU), the guest will kill
the application using the page, free the page, or do more action.

And we always pass the vIRQ to dom0 for logging and telemetry, user space tools
can take more proactive action for this if needed.
> 
> 
> I think it''s fine that the hypervisor takes some immediate action
in
> some cases. It is good to do this as quickly as possible, and only the
> hypervisor has all the information immediately available.
> 
> What would be needed for the Solaris framework, however, is to provide
> information on what action was taken, along with the telemetry. As
Agree that this modification is needed. Sorry we didn''t reliaze the
requirement from Dom0 after reboot.

Either we can pass the action in the telemetry, or Dom0 can take action specific
method ,like retrieve the offlined page from Xen before reboot. If we take the
former, we may need a interface definition.
> Christoph noted, the Solaris FMA code checks, at bootup, if there were
> components that previously had errors, and if so, it disables
> them again
> to prevent further errors. To be able to do this, it needs the full
> information not just on the error data, but also on any action
> taken by
> the hypervisor, so that it can repeat this action. It may take some
> modifications in the FMA code to account for the case where an action
> has already been taken (to avoid trying to take conflicting
> action), but
> I think that shouldn''t be a big problem. Although I don''t
know
> that part
> of our code very well.
> 
> The part that I still have doubts about, is the vMCE code. As far as I
> can tell, it takes the information out of the MCA banks, and
> stores it,
> per event, in a linked list. Per vMCE, the head of the list is
> taken and
> used as an MSR context. The rdmsr instruction is trapped and redirected
> to that information. It seems that the wrmsr instruction is accepted,
> but has no effect (except that if the trap handler writes a value and
> then reads it back again immediately, the values will be the same).
> The main argument for the vMCE code seems to be that it allows existing
> MCA handlers to be reused. However, I don''t see the advantage in
this.
> Basically, it allows the handler to retrieve the MCA banks
> through plain
> rdmsr instructions. Which is fine, but that''s as far as it
> goes. Without
> any additional information, that feature does not seem useful. wrmsr
> instructions has no effect. 
What do you mean of the effect of wrmsr instruction. We need considering inject
#GP if invalid wrmsr , or remove the event when guest clear the MCi_STATUS_MCA
if needed. We send this RFC early to get feedback firstly for the design idea.
Or you mean more than this for the wrmsr?
> 
> To take further action, the MCA code in dom0 (or a domU) needs to know
> that it is running under Xen, and it needs to have detailed physical
Our purpose is guest has no idea it is running under xen as descripted above.
And what information do you think a normal guest''s MCA handler needs to
know, and use the detailed physical information? After all, a guest cares only
itself. Also, maybe we can''t provide PV handler for all guest (like
windows).

Dom0 is a special case, it''s vIRQ handler knows it is running under
Xen, but that is for log/telemetry and for proactive action.
> information on the system. In other words, the existing code
> that can be
What do you mean of "existing", our patch or current Xen
implementation?
> used is only the code that gathers some information. So, the
> only thing
> that vMCE is good for, is that you can run unmodified error logging
> code. But you can''t interpret any of the error information further
> without knowing more. Especially for a domU, which might not know
> anything, this doesn''t seem useful. What would the user of a domU
do with
> that information? 
> To recap, I think the part where Xen itself takes action is fine, with
> some modifications. But I don''t see any advantages in vMCE
delivery,
> unless I''m missing something of course..
I think the main advantage are:
a) We don''t need maintain a PV MCA handler for guest, especially for
HVM guest
b) We can get benifit from guest''s MCA improvement/enhancement .
c) Applying this to dom0, we don''t need different mechanism to
dom0/hvm.

Thanks
Yunhong Jiang
> 
> - Frank_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Frank van der Linden

2009-Feb-24 18:53 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Thanks for your reply. Let me explain my comments a little:

Jiang, Yunhong wrote:> 
> One notice is, we delieve vMCE to dom0/domU only when it is impacted. The
idea behind this is, MCE is handled by Xen HV totally, while guest''s
vMCE handler will only works for itself. For example, when a page broken, Xen
will firstly mark the page offline in Xen side (i.e. take the recover action),
then, it will inject a vMCE to guest corresponding (dom0 or domU), the guest
will kill the application using the page, free the page, or do more action.
> 
> And we always pass the vIRQ to dom0 for logging and telemetry, user space
tools can take more proactive action for this if needed.
I understand this part, and have no problems with them mechanism itself. 
I think it has advantages over the original concept, where dom0 informs 
domUs. My question is: what useful action can a domU take without fully 
knowing the physical system? I''ll go more in to that below.
>> What would be needed for the Solaris framework, however, is to provide
>> information on what action was taken, along with the telemetry. As
> 
> Agree that this modification is needed. Sorry we didn''t reliaze
the requirement from Dom0 after reboot.
> 
> Either we can pass the action in the telemetry, or Dom0 can take action
specific method ,like retrieve the offlined page from Xen before reboot. If we
take the former, we may need a interface definition.
Passing the action along with the telemetry seems the best way to go to 
me. Since the telemetry is used to determine which action to take, any 
information on actions already taken should come at the same time.

\> 
> What do you mean of the effect of wrmsr instruction. We need considering
inject #GP if invalid wrmsr , or remove the event when guest clear the
MCi_STATUS_MCA if needed. We send this RFC early to get feedback firstly for the
design idea.
> Or you mean more than this for the wrmsr?
> 
>> To take further action, the MCA code in dom0 (or a domU) needs to know
>> that it is running under Xen, and it needs to have detailed physical
> 
> Our purpose is guest has no idea it is running under xen as descripted
above. And what information do you think a normal guest''s MCA handler
needs to know, and use the detailed physical information? After all, a guest
cares only itself. Also, maybe we can''t provide PV handler for all
guest (like windows).
> 
> Dom0 is a special case, it''s vIRQ handler knows it is running
under Xen, but that is for log/telemetry and for proactive action.
> 
>> information on the system. In other words, the existing code
>> that can be
> 
> What do you mean of "existing", our patch or current Xen
implementation?
> 
>> used is only the code that gathers some information. So, the
>> only thing
>> that vMCE is good for, is that you can run unmodified error logging
>> code. But you can''t interpret any of the error information
further
>> without knowing more. Especially for a domU, which might not know
>> anything, this doesn''t seem useful. What would the user of a
domU do with
>> that information? 
>> To recap, I think the part where Xen itself takes action is fine, with
>> some modifications. But I don''t see any advantages in vMCE
delivery,
>> unless I''m missing something of course..
> 
> I think the main advantage are:
> a) We don''t need maintain a PV MCA handler for guest, especially
for HVM guest
> b) We can get benifit from guest''s MCA improvement/enhancement .
> c) Applying this to dom0, we don''t need different mechanism to
dom0/hvm.
Ok, my main issue here is: if you want to enable a guest to run 
unmodified MCA code (which you state as a goal, and as an advantage of 
the vMCE approach), then what can the guest actually do. Or the dom0, 
for that matter?

MCA information is highly specific to the hardware. Without additional 
information on the hardware, it is hard, or even impossible, for the 
unmodified MCA handler in dom0 or a domU to do anything useful. It will 
interpret the information to fit the virtualized environment it is in, 
which doesn''t match the reality of the hardware at all. So what can it 
do? It can just read the MSRs and log the information, but even that 
information wouldn''t be useful; it is already available to dom0, where 
the code and/or person who can make sense of the data will see it. The 
unmodified MCA handler also can''t take any corrective action; it might 
think that it is taking action, but in fact, its wrmsr instructions have 
no effect (and they shouldn''t, guests should definitely not be able to 
do MSR writes).

I only see one possible exception to this: if you translate the ADDR MSR 
of a bank to a guest address in the vmca info before delivering the 
vMCE, then the guest could do something useful, because its virtualized 
MSR reads would then produce a guest address, and it could do something 
useful with it. But currently, your code doesn''t seem to do this; the 
virtualized MSR will produce the machine address, which the guest can''t
do anything with, unless it knows its running under Xen.

So that''s my main problem here: there is a contradiction. The vMCE 
mechanism as you implement it enables guests to run an unmodified MCA 
handler, but there isn''t actually much that the guest can do with that,
without knowing it runs under Xen. I see only one specific use for this: 
if you translate the ADDR info to a guest address, it could potentially 
try to do a "local" page retire.

- Frank

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Frank van der Linden

2009-Feb-24 19:07 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Kleen, Andi wrote:>> MCA information is highly specific to the hardware.
> 
> Actually Intel has architectural machine checks and except for
> some optional addon information explicitely marked it''s all
architectural
> (as in defined to stay the same going forward)
True, I probably expressed myself poorly here. I meant to say: it''s a 
physical hardware error, and in an unmodified virtualized environment 
the information about the physical hardware isn''t there.
> For DomU translation of the address is needed, that''s correct.
> For Dom0 logging physical is good because the logging tools
> might need that.
Right. As far as I understand it, this patch proposes to deliver the 
actual physical information to dom0 via the existing vIRQ mechanism, 
while the vMCE mechanism delivers virtualized info to any guest (both 
dom0 and domU).

- Frank


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Frank van der Linden

2009-Feb-24 20:47 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Kleen, Andi wrote:>> Kleen, Andi wrote:
> 
> So it''s generally better to inject generic events, not just
blindly forward.
> 
Agreed. I can see advantages to the vMCE code, but it has to deliver 
something to the domU that makes it do something reasonable. That''s why
I have some doubts about the patch that was sent, it doesn''t quite seem
to achieve that (certainly not without translating the address).

- Frank


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-25 02:25 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM>
wrote:> Kleen, Andi wrote:
>>> Kleen, Andi wrote:
>> 
>> So it''s generally better to inject generic events, not just
blindly
>> forward. 
>> 
> 
> Agreed. I can see advantages to the vMCE code, but it has to deliver
> something to the domU that makes it do something reasonable.
> That''s why
> I have some doubts about the patch that was sent, it doesn''t
> quite seem
> to achieve that (certainly not without translating the address).
> 
> - Frank
Yes, we should have include the translation. We didn''t do that when
sending out the patch because we thought the PV guest has idea of m2p
translation. Later we realized the translation is needed for PV guest after more
consideration, since the unmodified #MC handler will use guest address. Of
course we always need the translation for HVM guest, which however is not in
that patch''s scope . Sorry for any confusion caused.

One thing need notice is, the information passed through vIRQ is physical
information while dom0s'' MCA handler will get guest information, so
user space tools should be aware of such constraints.

So, Frank/Egger, can I assume followed are consensus currently?

1) MCE is handled by Xen HV totally, while guest''s vMCE handler will
only works for itself.
2) Xen present a virtual #MC to guest through MSR access emulation.(Xen will do
the translation if needed).
3) Guest''s unmodified MCE handler will handle the vMCE injected. 
4) Dom0 will get all log/telemetry through hypercall. 
5) The action taken by xen will be passed to dom0 through the telemetry
mechanism.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-25 02:26 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

> Right. As far as I understand it, this patch proposes to deliver the
> actual physical information to dom0 via the existing vIRQ mechanism,
> while the vMCE mechanism delivers virtualized info to any guest (both dom0
> and domU). 
Yes, excactly.
> 
> - Frank
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-25 02:31 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

> That''s needed anyways for example to support migration between
different
> types of CPUs. The DomU really cannot take a specific CPU type
> for granted or rather has to assume some fallback CPU. Also
> for virtualization
> it''s a common case that guests run very old OS, so it''s
better to give
> them the oldest possible events too.
> 
> So it''s generally better to inject generic events, not just
> blindly forward.
Andi, what''s the meaning of "generic event"? Do you mean the
option 3, i.e. some abstract event like page offlie, killing current execution
event?
Or you mean translate physical MSR value to guest-aware MSR value?

Thanks
Yunhong Jiang
> 
> Only for Dom0 which does logging the physical hardware needs
> to be described
> correctly.
> 
> -Andi_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Feb-25 10:37 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Tuesday 24 February 2009 20:07:16 Frank van der Linden
wrote:> Kleen, Andi wrote:
> >> MCA information is highly specific to the hardware.
> >
> > Actually Intel has architectural machine checks and except for
> > some optional addon information explicitely marked it''s all
architectural
> > (as in defined to stay the same going forward)
>
> True, I probably expressed myself poorly here. I meant to say:
it''s a
> physical hardware error, and in an unmodified virtualized environment
> the information about the physical hardware isn''t there.
>
> > For DomU translation of the address is needed, that''s
correct.
> > For Dom0 logging physical is good because the logging tools
> > might need that.
>
> Right. As far as I understand it, this patch proposes to deliver the
> actual physical information to dom0 via the existing vIRQ mechanism,
> while the vMCE mechanism delivers virtualized info to any guest (both
> dom0 and domU).
The translation is still problematic: What if an error occured which impacts
multiple physical contigous pages ? Translated into guest-physical
address space, they may be non-contigous.

That''s why the original design does not support HVM guests unless they
are aware about running in Xen via an PV machine check driver.

Christoph

-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Feb-25 10:57 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Tuesday 24 February 2009 21:33:47 Kleen, Andi wrote:> >Kleen, Andi wrote:
> >>> MCA information is highly specific to the hardware.
> >>
> >> Actually Intel has architectural machine checks and except for
> >> some optional addon information explicitely marked it''s
all
> >
> >architectural
> >
> >> (as in defined to stay the same going forward)
> >
> >True, I probably expressed myself poorly here. I meant to say:
it''s a
> >physical hardware error, and in an unmodified virtualized environment
> >the information about the physical hardware isn''t there.
>
> In a DomU it''s not important that the physical hardware is
correctly
> described, the only thing that matters is that the event triggers
> the DomU code to do the expected action.
I agree with that. The DomU see''s a hw environment which may
(partially) match the physical hardware. The physical machine check error
must be translated in a way that fits into the guest''s hw environment.
This is not just limited to the memory layout.

An example to clarify the point
(which actually won''t apply directly to Xen, but you should get the
idea):

The guest hw environment is an (emulated) sparc CPU, memory
and PCI devices. The host''s hw environment is a x86 PC. Now an
machine check error occurs. If you want to forward it into the guest,
you must translate it in a way as the guest OS would expect it from
a native sparc machine.


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Feb-25 12:19 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong
wrote:> Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote:
> > Kleen, Andi wrote:
> >>> Kleen, Andi wrote:
> >>
> >> So it''s generally better to inject generic events, not
just blindly
> >> forward.
> >
> > Agreed. I can see advantages to the vMCE code, but it has to deliver
> > something to the domU that makes it do something reasonable.
> > That''s why
> > I have some doubts about the patch that was sent, it doesn''t
> > quite seem
> > to achieve that (certainly not without translating the address).
> >
> > - Frank
>
> Yes, we should have include the translation. We didn''t do that
when sending
> out the patch because we thought the PV guest has idea of m2p translation.
> Later we realized the translation is needed for PV guest after more
> consideration, since the unmodified #MC handler will use guest address. Of
> course we always need the translation for HVM guest, which however is not
> in that patch''s scope . Sorry for any confusion caused.
>
> One thing need notice is, the information passed through vIRQ is physical
> information while dom0s'' MCA handler will get guest information,
so user
> space tools should be aware of such constraints.
>
> So, Frank/Egger, can I assume followed are consensus currently?
>
> 1) MCE is handled by Xen HV totally, while guest''s vMCE handler
will only
> works for itself.
> 2) Xen present a virtual #MC to guest through MSR access  
> emulation.(Xen will do the translation if needed).
> 3) Guest''s unmodified 
> MCE handler will handle the vMCE injected.
> 4) Dom0 will get all log/telemetry through hypercall.
> 5) The action taken by xen will be passed to dom0 through the telemetry
> mechanism.
Mostly. Regarding 2) I want like to discuss first how to handle errors
impacting multiple contiguous physical pages which are non-contigous
in guest physical space.

And I also want to discuss about how to do recovery actions requiring
PCI access. One example for this is
Shanghai''s "L3 Cache Index Disable"-Feature.
Xen delegates PCI config space to Dom0 and
via PCI passthrough partly to DomU.
That means, if registers in PCI config space are independently
accessable by Xen, Dom0 and/or DomU, they can interfere with each other.
Therefore, we need to
a) clearly define who handles what and
b) define some rules based on a)
c) discuss how to handle Dom0/DomU going wild
    and break the rules defined in b)


Christoph


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Frank van der Linden

2009-Feb-25 17:32 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Christoph Egger wrote:> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
>
>> So, Frank/Egger, can I assume followed are consensus currently?
>>
>> 1) MCE is handled by Xen HV totally, while guest''s vMCE
handler will only
>> works for itself.
>> 2) Xen present a virtual #MC to guest through MSR access  
>> emulation.(Xen will do the translation if needed).
>> 3) Guest''s unmodified 
>> MCE handler will handle the vMCE injected.
>> 4) Dom0 will get all log/telemetry through hypercall.
>> 5) The action taken by xen will be passed to dom0 through the telemetry
>> mechanism.
> 
> Mostly. Regarding 2) I want like to discuss first how to handle errors
> impacting multiple contiguous physical pages which are non-contigous
> in guest physical space.
> 
> And I also want to discuss about how to do recovery actions requiring
> PCI access. One example for this is
> Shanghai''s "L3 Cache Index Disable"-Feature.
> Xen delegates PCI config space to Dom0 and
> via PCI passthrough partly to DomU.
> That means, if registers in PCI config space are independently
> accessable by Xen, Dom0 and/or DomU, they can interfere with each other.
> Therefore, we need to
> a) clearly define who handles what and
> b) define some rules based on a)
> c) discuss how to handle Dom0/DomU going wild
>     and break the rules defined in b)
I also agree on the approach in principle, but would like to see these 
points addressed. For non-contiguous pages, I suppose Xen could deliver 
multiple #vMCEs to the guest, split into contiguous parts. The vmce code 
seems to be set up to be able to do this.

As for the Shanghai feature: Christoph, are there any documents 
available on that feature? What kind of errors are delivered 
(corrected/correctable)?

- Frank

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gavin Maltby

2009-Feb-25 22:30 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Christoph Egger wrote:
> Mostly. Regarding 2) I want like to discuss first how to handle errors
> impacting multiple contiguous physical pages which are non-contigous
> in guest physical space.
I can''t think of any such error types.  ECC checkwords don''t
span
page boundaries, so you only ever get an error at a time
affecting one small part of one page.  That physically adjacent
pages have both had errors would come our in the wash, but
they''d be processed and recognised individually.

Gavin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Feb-26 02:16 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Christopher/Egger, thanks for reply very much, see comments below.
>-----Original Message-----
>From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM] 
>Sent: 2009年2月26日 1:33
>To: Christoph Egger
>Cc: Jiang, Yunhong; Kleen, Andi; 
>xen-devel@lists.xensource.com; Keir Fraser; Ke, Liping; Gavin Maltby
>Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>
>Christoph Egger wrote:
>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
>>
>>> So, Frank/Egger, can I assume followed are consensus currently?
>>>
>>> 1) MCE is handled by Xen HV totally, while guest's vMCE 
>handler will only
>>> works for itself.
>>> 2) Xen present a virtual #MC to guest through MSR access  
>>> emulation.(Xen will do the translation if needed).
>>> 3) Guest's unmodified 
>>> MCE handler will handle the vMCE injected.
>>> 4) Dom0 will get all log/telemetry through hypercall.
>>> 5) The action taken by xen will be passed to dom0 through 
>the telemetry
>>> mechanism.
>> 
>> Mostly. Regarding 2) I want like to discuss first how to 
>handle errors
>> impacting multiple contiguous physical pages which are non-contigous
>> in guest physical space.
>> 
>> And I also want to discuss about how to do recovery actions requiring
>> PCI access. One example for this is
>> Shanghai's "L3 Cache Index Disable"-Feature.
>> Xen delegates PCI config space to Dom0 and
>> via PCI passthrough partly to DomU.
>> That means, if registers in PCI config space are independently
>> accessable by Xen, Dom0 and/or DomU, they can interfere with 
>each other.
>> Therefore, we need to
>> a) clearly define who handles what and
>> b) define some rules based on a)
>> c) discuss how to handle Dom0/DomU going wild
>>     and break the rules defined in b)
>
>I also agree on the approach in principle, but would like to see these 
>points addressed. For non-contiguous pages, I suppose Xen 
>could deliver 
>multiple #vMCEs to the guest, split into contiguous parts. The 
>vmce code 
>seems to be set up to be able to do this.
For the contigous pages, I agree with Gavin that such contiguous page error
should be triggered as multiple #MC and so is ok.

For PCI config space issue, Christoph, can you please share more information on
it (or provide some document as Frank suggested), like is it for CE (Correctable
error or UC(UnCorrectable error), is it in PCI range or PCI-E range (i.e.
through 0xCF8/CFC or through MMCONFIG), how the device's BDF caculated etc.
Followed is some of my understanding.

Firstly, if it is CE, Xen will do nothing and dom0 will take recovery action. If
it is UC, Xen will take action when all CPU is in SoftIRQ context, and dom0 will
not take action, so it should be ok.

Secondly, in Xen environment, per my understanding, CPU is owned by Xen HV, so
I'm not sure when dom0 disable L3 cache (if it is CE), should Xen be aware
or not. That is, should dom0 disable the cache directly, or it should user
hypercall to ask Xen do that. Keir can give us more suggestion.

For item C, currently Xen/dom0 can both access configuration space, while domU
will do that through PCI_frontend/backend. Because PCI backend only cover device
assigned to domU, so we don't need worry about domU and dom0 should be
trusted. However, one thing left is, if this range is beyond 0x100 (i.e. in
pci-e range), we need add mmconfig support in Xen, although it can be added
simply.

Thanks
-- Yunhong Jiang
>
>As for the Shanghai feature: Christoph, are there any documents 
>available on that feature? What kind of errors are delivered 
>(corrected/correctable)?
>
>- Frank
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Mar-02 05:51 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Frank/Christopher, can you please give more comments for it, or you are OK with
this?
For the action reporting mechanism, we will send out a proposal for review soon.

Thanks
Yunhong Jiang

Jiang, Yunhong <> wrote:> Christopher/Frank, thanks for reply very much, see comments below.
> 
>> -----Original Message-----
>> From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM]
Sent:
>> 2009年2月26日 1:33 To: Christoph Egger
>> Cc: Jiang, Yunhong; Kleen, Andi;
>> xen-devel@lists.xensource.com; Keir Fraser; Ke, Liping; Gavin Maltby
>> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
>> 
>> Christoph Egger wrote:
>>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
>>> 
>>>> So, Frank/Egger, can I assume followed are consensus currently?
>>>> 
>>>> 1) MCE is handled by Xen HV totally, while guest's vMCE
handler will
>>>> only works for itself. 2) Xen present a virtual #MC to guest
through MSR
>>>> access emulation.(Xen will do the translation if needed).
>>>> 3) Guest's unmodified
>>>> MCE handler will handle the vMCE injected.
>>>> 4) Dom0 will get all log/telemetry through hypercall.
>>>> 5) The action taken by xen will be passed to dom0 through the
telemetry
>>>> mechanism.
>>> 
>>> Mostly. Regarding 2) I want like to discuss first how to handle
errors
>>> impacting multiple contiguous physical pages which are
non-contigous
>>> in guest physical space.
> 
> 
>>> 
>>> And I also want to discuss about how to do recovery actions
requiring
>>> PCI access. One example for this is
>>> Shanghai's "L3 Cache Index Disable"-Feature.
>>> Xen delegates PCI config space to Dom0 and
>>> via PCI passthrough partly to DomU.
>>> That means, if registers in PCI config space are independently
>>> accessable by Xen, Dom0 and/or DomU, they can interfere with each
other.
>>> Therefore, we need to a) clearly define who handles what and
>>> b) define some rules based on a)
>>> c) discuss how to handle Dom0/DomU going wild
>>>     and break the rules defined in b)
>> 
>> I also agree on the approach in principle, but would like to see these
>> points addressed. For non-contiguous pages, I suppose Xen
>> could deliver
>> multiple #vMCEs to the guest, split into contiguous parts. The
>> vmce code
>> seems to be set up to be able to do this.
> 
> For the contigous pages, I agree with Gavin that such
> contiguous page error should be triggered as multiple #MC and so is ok.
> 
> For PCI config space issue, Christoph, can you please share
> more information on it (or provide some document as Frank
> suggested), like is it for CE (Correctable error or
> UC(UnCorrectable error), is it in PCI range or PCI-E range
> (i.e. through 0xCF8/CFC or through MMCONFIG), how the device's
> BDF caculated etc. Followed is some of my understanding.
> 
> Firstly, if it is CE, Xen will do nothing and dom0 will take
> recovery action. If it is UC, Xen will take action when all
> CPU is in SoftIRQ context, and dom0 will not take action, so
> it should be ok.
> 
> Secondly, in Xen environment, per my understanding, CPU is
> owned by Xen HV, so I'm not sure when dom0 disable L3 cache
> (if it is CE), should Xen be aware or not. That is, should
> dom0 disable the cache directly, or it should user hypercall
> to ask Xen do that. Keir can give us more suggestion.
> 
> For item C, currently Xen/dom0 can both access configuration
> space, while domU will do that through PCI_frontend/backend.
> Because PCI backend only cover device assigned to domU, so we
> don't need worry about domU and dom0 should be trusted.
> However, one thing left is, if this range is beyond 0x100
> (i.e. in pci-e range), we need add mmconfig support in Xen,
> although it can be added simply.
> 
> Thanks
> -- Yunhong Jiang
> 
>> 
>> As for the Shanghai feature: Christoph, are there any documents
>> available on that feature? What kind of errors are delivered
>> (corrected/correctable)? 
>> 
>> - Frank_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Mar-02 14:51 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Monday 02 March 2009 06:51:22 Jiang, Yunhong wrote:> Frank/Christopher, can you please give more comments for it, or you are OK
Sorry, for the delay. I''m also busy with other tasks.
> with this? For the action reporting mechanism, we will send out a proposal
> for review soon.
I would like to see interface definition first, which covers all aspects
we discussed.


>
> Thanks
> Yunhong Jiang
>
> Jiang, Yunhong <> wrote:
> > Christopher/Frank, thanks for reply very much, see comments below.
> >
> >> -----Original Message-----
> >> From: Frank.Vanderlinden@Sun.COM
[mailto:Frank.Vanderlinden@Sun.COM]
> >> Sent: 2009年2月26日 1:33 To: Christoph Egger
> >> Cc: Jiang, Yunhong; Kleen, Andi;
> >> xen-devel@lists.xensource.com; Keir Fraser; Ke, Liping; Gavin
Maltby
> >> Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in
XEN
> >>
> >> Christoph Egger wrote:
> >>> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
> >>>> So, Frank/Egger, can I assume followed are consensus
currently?
> >>>>
> >>>> 1) MCE is handled by Xen HV totally, while
guest''s vMCE handler will
> >>>> only works for itself. 2) Xen present a virtual #MC to
guest through
> >>>> MSR access emulation.(Xen will do the translation if
needed).
> >>>> 3) Guest''s unmodified
> >>>> MCE handler will handle the vMCE injected.
> >>>> 4) Dom0 will get all log/telemetry through hypercall.
> >>>> 5) The action taken by xen will be passed to dom0 through
the
> >>>> telemetry mechanism.
> >>>
> >>> Mostly. Regarding 2) I want like to discuss first how to
handle errors
> >>> impacting multiple contiguous physical pages which are
non-contigous
> >>> in guest physical space.
> >>>
> >>>
> >>>
> >>> And I also want to discuss about how to do recovery actions
requiring
> >>> PCI access. One example for this is
> >>> Shanghai''s "L3 Cache Index
Disable"-Feature.
> >>> Xen delegates PCI config space to Dom0 and
> >>> via PCI passthrough partly to DomU.
> >>> That means, if registers in PCI config space are independently
> >>> accessable by Xen, Dom0 and/or DomU, they can interfere with
each
> >>> other. Therefore, we need to a) clearly define who handles
what and
> >>> b) define some rules based on a)
> >>> c) discuss how to handle Dom0/DomU going wild
> >>>     and break the rules defined in b)
> >>
> >> I also agree on the approach in principle, but would like to see
these
> >> points addressed. For non-contiguous pages, I suppose Xen
> >> could deliver
> >> multiple #vMCEs to the guest, split into contiguous parts. The
> >> vmce code
> >> seems to be set up to be able to do this.
> >
> > For the contigous pages, I agree with Gavin that such
> > contiguous page error should be triggered as multiple #MC and so is
ok.
> >
> > For PCI config space issue, Christoph, can you please share
> > more information on it (or provide some document as Frank
> > suggested), like is it for CE (Correctable error or
> > UC(UnCorrectable error), is it in PCI range or PCI-E range
> > (i.e. through 0xCF8/CFC or through MMCONFIG), how the
device''s
> > BDF caculated etc. Followed is some of my understanding.
> >
> > Firstly, if it is CE, Xen will do nothing and dom0 will take
> > recovery action. If it is UC, Xen will take action when all
> > CPU is in SoftIRQ context, and dom0 will not take action, so
> > it should be ok.
> >
> > Secondly, in Xen environment, per my understanding, CPU is
> > owned by Xen HV, so I''m not sure when dom0 disable L3 cache
> > (if it is CE), should Xen be aware or not. That is, should
> > dom0 disable the cache directly, or it should user hypercall
> > to ask Xen do that. Keir can give us more suggestion.
> >
> > For item C, currently Xen/dom0 can both access configuration
> > space, while domU will do that through PCI_frontend/backend.
> > Because PCI backend only cover device assigned to domU, so we
> > don''t need worry about domU and dom0 should be trusted.
> > However, one thing left is, if this range is beyond 0x100
> > (i.e. in pci-e range), we need add mmconfig support in Xen,
> > although it can be added simply.
> >
> > Thanks
> > -- Yunhong Jiang
> >
> >> As for the Shanghai feature: Christoph, are there any documents
> >> available on that feature?
Yes, our BKDG.
> >> What kind of errors are delivered (corrected/correctable)?
The error type can be both depending on whether correction
via ECC was successful or not.


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Mar-02 14:58 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Thursday 26 February 2009 03:16:29 Jiang, Yunhong
wrote:> Christopher/Egger, thanks for reply very much, see comments below.
>
> >-----Original Message-----
> >From: Frank.Vanderlinden@Sun.COM [mailto:Frank.Vanderlinden@Sun.COM]
> >Sent: 2009年2月26日 1:33
> >To: Christoph Egger
> >Cc: Jiang, Yunhong; Kleen, Andi;
> >xen-devel@lists.xensource.com; Keir Fraser; Ke, Liping; Gavin Maltby
> >Subject: Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN
> >
> >Christoph Egger wrote:
> >> On Wednesday 25 February 2009 03:25:12 Jiang, Yunhong wrote:
> >>> So, Frank/Egger, can I assume followed are consensus
currently?
> >>>
> >>> 1) MCE is handled by Xen HV totally, while guest''s
vMCE
> >
> >handler will only
> >
> >>> works for itself.
> >>> 2) Xen present a virtual #MC to guest through MSR access
> >>> emulation.(Xen will do the translation if needed).
> >>> 3) Guest''s unmodified
> >>> MCE handler will handle the vMCE injected.
> >>> 4) Dom0 will get all log/telemetry through hypercall.
> >>> 5) The action taken by xen will be passed to dom0 through
> >
> >the telemetry
> >
> >>> mechanism.
> >>
> >> Mostly. Regarding 2) I want like to discuss first how to
> >
> >handle errors
> >
> >> impacting multiple contiguous physical pages which are
non-contigous
> >> in guest physical space.
> >>
> >>
> >>
> >> And I also want to discuss about how to do recovery actions
requiring
> >> PCI access. One example for this is
> >> Shanghai''s "L3 Cache Index Disable"-Feature.
> >> Xen delegates PCI config space to Dom0 and
> >> via PCI passthrough partly to DomU.
> >> That means, if registers in PCI config space are independently
> >> accessable by Xen, Dom0 and/or DomU, they can interfere with
> >
> >each other.
> >
> >> Therefore, we need to
> >> a) clearly define who handles what and
> >> b) define some rules based on a)
> >> c) discuss how to handle Dom0/DomU going wild
> >>     and break the rules defined in b)
> >
> >I also agree on the approach in principle, but would like to see these
> >points addressed. For non-contiguous pages, I suppose Xen
> >could deliver
> >multiple #vMCEs to the guest, split into contiguous parts. The
> >vmce code
> >seems to be set up to be able to do this.
For virtual MCEs that is ok. But note, for unmodified guests, the MC handler
is written with the assumption that the CPU powers off when an #MCE
happens before the handler cleared the MCIP bit in the MCG_STATUS MSR.
>
> For the contigous pages, I agree with Gavin that such contiguous page error
> should be triggered as multiple #MC and so is ok.
>
> For PCI config space issue, Christoph, can you please share more
> information on it (or provide some document as Frank suggested), like is it
> for CE (Correctable error or UC(UnCorrectable error), is it in PCI range or
> PCI-E range (i.e. through 0xCF8/CFC or through MMCONFIG), how the
device''s
> BDF caculated etc. Followed is some of my understanding.
I would like to see a generic solution that works with any feature
requiring access to the pci space rather a per-feature solution.

> Firstly, if it is CE, Xen will do nothing and dom0 will take recovery
> action. If it is UC, Xen will take action when all CPU is in SoftIRQ
> context, and dom0 will not take action, so it should be ok.
>
> Secondly, in Xen environment, per my understanding, CPU is owned by Xen HV,
> so I''m not sure when dom0 disable L3 cache (if it is CE), should
Xen be
> aware or not. That is, should dom0 disable the cache directly, or it should
> user hypercall to ask Xen do that. Keir can give us more suggestion.
>
> For item C, currently Xen/dom0 can both access configuration space, while
> domU will do that through PCI_frontend/backend. Because PCI backend only
> cover device assigned to domU, so we don''t need worry about domU
and dom0
> should be trusted. However, one thing left is, if this range is beyond
> 0x100 (i.e. in pci-e range), we need add mmconfig support in Xen, although
> it can be added simply.
>
> Thanks
> -- Yunhong Jiang
>
> >As for the Shanghai feature: Christoph, are there any documents
> >available on that feature? What kind of errors are delivered
> >(corrected/correctable)?
> >
> >- Frank


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Mar-02 16:09 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

xen-devel-bounces@lists.xensource.com <> wrote:> On Monday 02 March 2009 06:51:22 Jiang, Yunhong wrote:
>> Frank/Christopher, can you please give more comments for it, or you are
OK
> 
> Sorry, for the delay. I''m also busy with other tasks.
> 
>> with this? For the action reporting mechanism, we will send out a
proposal
>> for review soon.
> 
> I would like to see interface definition first, which covers
> all aspects
> we discussed.
> 
>>>> As for the Shanghai feature: Christoph, are there any documents
>>>> available on that feature?
> 
> Yes, our BKDG.
I checked BKDG for both Family 10 and 11
(http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41256.pdf
and
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.pdf,
and didn''t find the related info, Can you share more info like the URL
and the section number?
> 
>>>> What kind of errors are delivered (corrected/correctable)?
> 
> The error type can be both depending on whether correction
> via ECC was successful or not.
So you mean if ECC failed in the L3 cache, Xen must do "L3 Cache Index
Disable" immediately to avoid the cache not be used anymore?

Thanks
Yunhong Jiang
> 
> 
> --
> ---to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Mar-02 16:15 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

> 
> For virtual MCEs that is ok. But note, for unmodified guests,
> the MC handler
> is written with the assumption that the CPU powers off when an #MCE
> happens before the handler cleared the MCIP bit in the MCG_STATUS MSR.
That should depends on implementation, for example, we can inject the vMCE one
by one, i.e. only inject next after the first is handled already.
> 
>> 
>> For the contigous pages, I agree with Gavin that such contiguous page
error
>> should be triggered as multiple #MC and so is ok.
>> 
>> For PCI config space issue, Christoph, can you please share more
>> information on it (or provide some document as Frank suggested), like
is it
>> for CE (Correctable error or UC(UnCorrectable error), is it in PCI
range or
>> PCI-E range (i.e. through 0xCF8/CFC or through MMCONFIG), how the
device''s
>> BDF caculated etc. Followed is some of my understanding.
> 
> I would like to see a generic solution that works with any feature
> requiring access to the pci space rather a per-feature solution.
I think the solution is , Xen care for MCE while dom0 care for CE error. Or
another solution is all PCI access for CPU RAS is done by Xen since Xen owns
CPU. ISome information like how the pci config space is arranged will be
helpful, I think.

Thanks
Yunhong Jiang
> 
> 
>> Firstly, if it is CE, Xen will do nothing and dom0 will take recovery
>> action. If it is UC, Xen will take action when all CPU is in SoftIRQ
>> context, and dom0 will not take action, so it should be ok.
>> 
>> Secondly, in Xen environment, per my understanding, CPU is owned by Xen
HV,
>> so I''m not sure when dom0 disable L3 cache (if it is CE),
should Xen be
>> aware or not. That is, should dom0 disable the cache directly, or it
should
>> user hypercall to ask Xen do that. Keir can give us more suggestion.
>> 
>> For item C, currently Xen/dom0 can both access configuration space,
while
>> domU will do that through PCI_frontend/backend. Because PCI backend
only
>> cover device assigned to domU, so we don''t need worry about
domU and dom0
>> should be trusted. However, one thing left is, if this range is beyond
>> 0x100 (i.e. in pci-e range), we need add mmconfig support in Xen,
although
>> it can be added simply. 
>> 
>> Thanks
>> -- Yunhong Jiang
>> 
>>> As for the Shanghai feature: Christoph, are there any documents
>>> available on that feature? What kind of errors are delivered
>>> (corrected/correctable)? 
>>> 
>>> - Frank
> 
> 
> 
> --
> ---to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Frank van der Linden

2009-Mar-02 17:47 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Jiang, Yunhong wrote:> Frank/Christopher, can you please give more comments for it, or you are OK
with this?
> For the action reporting mechanism, we will send out a proposal for review
soon.
I''m ok with this. We need a little more information on the AMD
mechanism, but it seems to me that we can fit this in.

Sometime this week, I''ll also send out the last of our changes that
haven''t been sent upstream to xen-unstable yet. Maybe we can combine
some things in to one patch, like the telemetry handling changes that
Gavin did. The other changes are error injection (for debugging) and
panic crash dump support for our FMA tools, but those are probably only
interesting to us.

- Frank

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Mar-05 04:45 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM>
wrote:> Jiang, Yunhong wrote:
>> Frank/Christopher, can you please give more comments for it, or you are
OK
>> with this? For the action reporting mechanism, we will send out a
proposal
>> for review soon. 
> 
> I''m ok with this. We need a little more information on the AMD
> mechanism, but it seems to me that we can fit this in.
> 
> Sometime this week, I''ll also send out the last of our changes
that
> haven''t been sent upstream to xen-unstable yet. Maybe we can
combine
> some things in to one patch, like the telemetry handling changes that
> Gavin did. The other changes are error injection (for debugging) and
> panic crash dump support for our FMA tools, but those are probably only
> interesting to us. 
> 
> - Frank
Glad to knows about the conclusion. See my reply to Christoph on the AMD
mechanism, but still waiting for response.

Thanks
Yunhong Jiang
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Mar-05 08:31 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Christoph/Frank, Followed is the interface definition, please have a look.

Thanks
Yunhong Jiang

1) Interface between Xen/dom0 for passing xen''s recovery action
information to dom0.
   Usage model: After offlining broken page, Xen might pass its page-offline
recovery action
   result information to dom0. Dom0 will save the information in non-volatile
memory for further
   proactive actions, such as offlining the easy-broken page early when doing
next reboot.


struct page_offline_action
{
    /* Params for passing the offlined page number to DOM0 */
    uint64_t mfn;
    uint64_t status; /* Similar to page offline hypercall */
};

struct cpu_offline_action
{
    /* Params for passing the identity of the offlined CPU to DOM0 */
    uint32_t mc_socketid;
    uint16_t mc_coreid;
    uint16_t mc_core_threadid;
};

struct cache_shrink_action
{
    /* TBD, Christoph, please fill it */
};

/* Recover action flags, giving recovery result information to guest */
/* Recovery successfully after taking certain recovery actions below */
#define REC_ACT_RECOVERED      (0x1 << 0)
/* For solaris''s usage that dom0 will take ownership when crash */
#define REC_ACT_RESET          (0x1 << 2)
/* No action is performed by XEN */
#define REC_ACT_INFO           (0x1 << 3)

/* Recover action type definition, valid only when flags & 
REC_ACT_RECOVERED */
#define MC_ACT_PAGE_OFFLINE 1
#define MC_ACT_CPU_OFFLINE   2
#define MC_ACT_CACHE_SHIRNK 3

struct recovery_action
{
    uint8_t flags;
    uint8_t action_type;
    union
    {
        struct page_offline_action page_retire;
        struct cpu_offline_action cpu_offline;
        struct cache_shrink_action cache_shrink;
        uint8_t pad[MAX_ACTION_SIZE];
    } action_info;
}

struct mcinfo_bank {
    struct mcinfo_common common;

    uint16_t mc_bank; /* bank nr */
    uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0
                        * and if mc_addr is valid. Never valid on DomU. */
    uint64_t mc_status; /* bank status */
    uint64_t mc_addr;   /* bank address, only valid
                         * if addr bit is set in mc_status */
    uint64_t mc_misc;
    uint64_t mc_ctrl2;
    uint64_t mc_tsc;
    /* Recovery action is performed per bank */
    struct recovery_action action;
};

2) Below two interfaces are for MCA processing internal use.
    a. pre_handler will be called earlier in MCA ISR context, mainly for early
need_reset
        detection for avoiding log missing (flag MCA_RESET).  Also, pre_handler
might
        be able to find the impacted domain if possible.
    b. mca_error_handler is actually a (error_action_index, recovery_handler
pointer) pair.
       The defined recovery_handler function performs the actual recovery
operations in
       softIrq context after the per_bank MCA error matching the corresponding
mca_code index.
       If pre_handler can''t judge the impacted domain, recovery_handler
must figure it out.

/* Error has been recovered successfully */
#define MCA_RECOVERD    0
/* Error impact one guest as stated in owner field */
#define MCA_OWNER       1
/* Error can''t be recovered and need reboot system */
#define MCA_RESET       2
/* Error should be handled in softIRQ context */
#define MCA_MORE_ACTION 3

struct mca_handle_result
{
    uint32_t flags;
    /* Valid only when flags & MCA_OWNER */
    domid_d owner;
    /* valid only when flags & MCA_RECOVERD */
    struct  recovery_action *action;
};

struct mca_error_handler
{
    /*
     * Assume we will need only architecture defined code. If the index
can''t be setup by
     * mca_code, we will add a function to do the (index, recovery_handler)
mapping check.
     * This mca_code represents the recovery handler pointer index for
identifying this
     * particular error''s corresponding recover action
    */
    uint16_t mca_code;

    /* Handler to be called in softIRQ handler context */
    int recovery_handler(struct mcinfo_bank *bank,
                     struct mcinfo_global *global,
                     struct mcinfo_extended *extention,
                     struct mca_handle_result *result);

};

struct mca_error_handler intel_mca_handler[] = 
{
    ....
};

struct mca_error_handler amd_mca_handler[] {
    ....
};


/* HandlVer to be called in MCA ISR in MCA context */
int intel_mca_pre_handler(struct cpu_user_regs *regs,
                                struct mca_handle_result *result);

int amd_mca_pre_handler(struct cpu_user_regs *regs,
                            struct mca_handle_result *result);


Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM>
wrote:> Jiang, Yunhong wrote:
>> Frank/Christopher, can you please give more comments for it, or you are
OK
>> with this? For the action reporting mechanism, we will send out a
proposal
>> for review soon. 
> 
> I''m ok with this. We need a little more information on the AMD
> mechanism, but it seems to me that we can fit this in.
> 
> Sometime this week, I''ll also send out the last of our changes
that
> haven''t been sent upstream to xen-unstable yet. Maybe we can
combine
> some things in to one patch, like the telemetry handling changes that
> Gavin did. The other changes are error injection (for debugging) and
> panic crash dump support for our FMA tools, but those are probably only
> interesting to us. 
> 
> - Frank_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Mar-05 14:53 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

MC_ACT_CACHE_SHIRNK  <-- typo. should be MC_ACT_CACHE_SHRINK 

The L3 cache index disable feature works like this:

You read the bits 17:6  from the MSR 0xC0000408 (which is MC4_MISC1)
and write it into the index field. This MSR does not belong to the standard
mc bank data and is therefore provided by mcinfo_extended.
The index field are the bits 11:0 of the PCI function 3 register
"L3 Cache Index Disable".

Why is the recover action bound to the bank ?
I would like to see a struct mcinfo_recover  rather extending
struct mcinfo_bank.  That gives us flexibility.

Christoph


On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:> Christoph/Frank, Followed is the interface definition, please have a look.
>
> Thanks
> Yunhong Jiang
>
> 1) Interface between Xen/dom0 for passing xen''s recovery action
information
> to dom0. Usage model: After offlining broken page, Xen might pass its
> page-offline recovery action result information to dom0. Dom0 will save the
> information in non-volatile memory for further proactive actions, such as
> offlining the easy-broken page early when doing next reboot.
>
>
> struct page_offline_action
> {
>     /* Params for passing the offlined page number to DOM0 */
>     uint64_t mfn;
>     uint64_t status; /* Similar to page offline hypercall */
> };
>
> struct cpu_offline_action
> {
>     /* Params for passing the identity of the offlined CPU to DOM0 */
>     uint32_t mc_socketid;
>     uint16_t mc_coreid;
>     uint16_t mc_core_threadid;
> };
>
> struct cache_shrink_action
> {
>     /* TBD, Christoph, please fill it */
> };
>
> /* Recover action flags, giving recovery result information to guest */
> /* Recovery successfully after taking certain recovery actions below */
> #define REC_ACT_RECOVERED      (0x1 << 0)
> /* For solaris''s usage that dom0 will take ownership when crash */
> #define REC_ACT_RESET          (0x1 << 2)
> /* No action is performed by XEN */
> #define REC_ACT_INFO           (0x1 << 3)
>
> /* Recover action type definition, valid only when flags & 
> REC_ACT_RECOVERED */
> #define MC_ACT_PAGE_OFFLINE 1 
> #define MC_ACT_CPU_OFFLINE   2
> #define MC_ACT_CACHE_SHIRNK 3
>
> struct recovery_action
> {
>     uint8_t flags;
>     uint8_t action_type;
>     union
>     {
>         struct page_offline_action page_retire;
>         struct cpu_offline_action cpu_offline;
>         struct cache_shrink_action cache_shrink;
>         uint8_t pad[MAX_ACTION_SIZE];
>     } action_info;
> }
>
> struct mcinfo_bank {
>     struct mcinfo_common common;
>
>     uint16_t mc_bank; /* bank nr */
>     uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on dom0
>                         * and if mc_addr is valid. Never valid on DomU. */
>     uint64_t mc_status; /* bank status */
>     uint64_t mc_addr;   /* bank address, only valid
>                          * if addr bit is set in mc_status */
>     uint64_t mc_misc;
>     uint64_t mc_ctrl2;
>     uint64_t mc_tsc;
>     /* Recovery action is performed per bank */
>     struct recovery_action action;
> };
>
> 2) Below two interfaces are for MCA processing internal use.
>     a. pre_handler will be called earlier in MCA ISR context, mainly for
> early need_reset detection for avoiding log missing (flag MCA_RESET). 
> Also, pre_handler might be able to find the impacted domain if possible.
>     b. mca_error_handler is actually a (error_action_index,
> recovery_handler pointer) pair. The defined recovery_handler function
> performs the actual recovery operations in softIrq context after the
> per_bank MCA error matching the corresponding mca_code index. If
> pre_handler can''t judge the impacted domain, recovery_handler must
figure
> it out.
>
> /* Error has been recovered successfully */
> #define MCA_RECOVERD    0
> /* Error impact one guest as stated in owner field */
> #define MCA_OWNER       1
> /* Error can''t be recovered and need reboot system */
> #define MCA_RESET       2
> /* Error should be handled in softIRQ context */
> #define MCA_MORE_ACTION 3
>
> struct mca_handle_result
> {
>     uint32_t flags;
>     /* Valid only when flags & MCA_OWNER */
>     domid_d owner;
>     /* valid only when flags & MCA_RECOVERD */
>     struct  recovery_action *action;
> };
>
> struct mca_error_handler
> {
>     /*
>      * Assume we will need only architecture defined code. If the index
> can''t be setup by * mca_code, we will add a function to do the
(index,
> recovery_handler) mapping check. * This mca_code represents the recovery
> handler pointer index for identifying this * particular error''s
> corresponding recover action
>     */
>     uint16_t mca_code;
>
>     /* Handler to be called in softIRQ handler context */
>     int recovery_handler(struct mcinfo_bank *bank,
>                      struct mcinfo_global *global,
>                      struct mcinfo_extended *extention,
>                      struct mca_handle_result *result);
>
> };
>
> struct mca_error_handler intel_mca_handler[] > {
>     ....
> };
>
> struct mca_error_handler amd_mca_handler[] > {
>     ....
> };
>
>
> /* HandlVer to be called in MCA ISR in MCA context */
> int intel_mca_pre_handler(struct cpu_user_regs *regs,
>                                 struct mca_handle_result *result);
>
> int amd_mca_pre_handler(struct cpu_user_regs *regs,
>                             struct mca_handle_result *result);
>
> Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM> wrote:
> > Jiang, Yunhong wrote:
> >> Frank/Christopher, can you please give more comments for it, or
you are
> >> OK with this? For the action reporting mechanism, we will send out
a
> >> proposal for review soon.
> >
> > I''m ok with this. We need a little more information on the
AMD
> > mechanism, but it seems to me that we can fit this in.
> >
> > Sometime this week, I''ll also send out the last of our
changes that
> > haven''t been sent upstream to xen-unstable yet. Maybe we can
combine
> > some things in to one patch, like the telemetry handling changes that
> > Gavin did. The other changes are error injection (for debugging) and
> > panic crash dump support for our FMA tools, but those are probably
only
> > interesting to us.
> >
> > - Frank


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Mar-05 15:19 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Christoph Egger <mailto:Christoph.Egger@amd.com>
wrote:> MC_ACT_CACHE_SHIRNK  <-- typo. should be MC_ACT_CACHE_SHRINK
Ahh, yes, I will fix it.
> 
> The L3 cache index disable feature works like this:
> 
> You read the bits 17:6  from the MSR 0xC0000408 (which is MC4_MISC1)
> and write it into the index field. This MSR does not belong to
> the standard
> mc bank data and is therefore provided by mcinfo_extended.
> The index field are the bits 11:0 of the PCI function 3 register "L3
Cache
> Index Disable". 
So what''s the offset of "L3 Cache Index Disable"? Is it in
256 byte or 4K byte?

For the PCI access, I''d prefer to have xen to control all these, i.e.
even if dom0 want to disable the L3 cache, it is done through a hypercall. The
reason is, Xen control the CPU, so keep it in Xen will make things simpler.

Of course, it is ok for me too, if you want to keep Xen for #MC handler and Dom0
for CE handler.
> 
> Why is the recover action bound to the bank ?
> I would like to see a struct mcinfo_recover  rather extending
> struct mcinfo_bank.  That gives us flexibility.
I''d get input from Frank or Gavin. Place mcinfo_recover in mcinfo_back
has advantage of keep connection of the error source and the action, but it do
make the mcinfo_bank more complex. Or we can keep the cpu/bank information in
the mcinfo_recover also, so that we keep the flexibility and don''t lose
the connection.

Thanks
Yunhong Jiang


> 
> Christoph
> 
> 
> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:
>> Christoph/Frank, Followed is the interface definition, please have a
look.
>> 
>> Thanks
>> Yunhong Jiang
>> 
>> 1) Interface between Xen/dom0 for passing xen''s recovery
action information
>> to dom0. Usage model: After offlining broken page, Xen might pass its
>> page-offline recovery action result information to dom0. Dom0 will save
the
>> information in non-volatile memory for further proactive actions, such
as
>> offlining the easy-broken page early when doing next reboot.
>> 
>> 
>> struct page_offline_action
>> {
>>     /* Params for passing the offlined page number to DOM0 */    
uint64_t
>>     mfn; uint64_t status; /* Similar to page offline hypercall */ };
>> 
>> struct cpu_offline_action
>> {
>>     /* Params for passing the identity of the offlined CPU to DOM0 */
>>     uint32_t mc_socketid; uint16_t mc_coreid;
>>     uint16_t mc_core_threadid;
>> };
>> 
>> struct cache_shrink_action
>> {
>>     /* TBD, Christoph, please fill it */
>> };
>> 
>> /* Recover action flags, giving recovery result information to guest */
>> /* Recovery successfully after taking certain recovery actions below */
>> #define REC_ACT_RECOVERED      (0x1 << 0)
>> /* For solaris''s usage that dom0 will take ownership when
crash */
>> #define REC_ACT_RESET          (0x1 << 2)
>> /* No action is performed by XEN */
>> #define REC_ACT_INFO           (0x1 << 3)
>> 
>> /* Recover action type definition, valid only when flags &
>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1
>> #define MC_ACT_CPU_OFFLINE   2
>> #define MC_ACT_CACHE_SHIRNK 3
>> 
>> struct recovery_action
>> {
>>     uint8_t flags;
>>     uint8_t action_type;
>>     union
>>     {
>>         struct page_offline_action page_retire;
>>         struct cpu_offline_action cpu_offline;
>>         struct cache_shrink_action cache_shrink;
>>         uint8_t pad[MAX_ACTION_SIZE];
>>     } action_info;
>> }
>> 
>> struct mcinfo_bank {
>>     struct mcinfo_common common;
>> 
>>     uint16_t mc_bank; /* bank nr */
>>     uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr on
dom0
>>                         * and if mc_addr is valid. Never valid on DomU.
*/
>>     uint64_t mc_status; /* bank status */
>>     uint64_t mc_addr;   /* bank address, only valid
>>                          * if addr bit is set in mc_status */    
uint64_t
>>     mc_misc; uint64_t mc_ctrl2;
>>     uint64_t mc_tsc;
>>     /* Recovery action is performed per bank */
>>     struct recovery_action action;
>> };
>> 
>> 2) Below two interfaces are for MCA processing internal use.
>>     a. pre_handler will be called earlier in MCA ISR context, mainly
for
>> early need_reset detection for avoiding log missing (flag MCA_RESET).
>> Also, pre_handler might be able to find the impacted domain if
possible.
>>     b. mca_error_handler is actually a (error_action_index,
>> recovery_handler pointer) pair. The defined recovery_handler function
>> performs the actual recovery operations in softIrq context after the
>> per_bank MCA error matching the corresponding mca_code index. If
>> pre_handler can''t judge the impacted domain, recovery_handler
must figure
>> it out. 
>> 
>> /* Error has been recovered successfully */
>> #define MCA_RECOVERD    0
>> /* Error impact one guest as stated in owner field */ #define MCA_OWNER
>> 1 /* Error can''t be recovered and need reboot system */
#define MCA_RESET
>> 2 /* Error should be handled in softIRQ context */
>> #define MCA_MORE_ACTION 3
>> 
>> struct mca_handle_result
>> {
>>     uint32_t flags;
>>     /* Valid only when flags & MCA_OWNER */
>>     domid_d owner;
>>     /* valid only when flags & MCA_RECOVERD */
>>     struct  recovery_action *action;
>> };
>> 
>> struct mca_error_handler
>> {
>>     /*
>>      * Assume we will need only architecture defined code. If the index
>> can''t be setup by * mca_code, we will add a function to do the
(index,
>> recovery_handler) mapping check. * This mca_code represents the
recovery
>> handler pointer index for identifying this * particular
error''s
>>     corresponding recover action */
>>     uint16_t mca_code;
>> 
>>     /* Handler to be called in softIRQ handler context */
>>     int recovery_handler(struct mcinfo_bank *bank,
>>                      struct mcinfo_global *global,
>>                      struct mcinfo_extended *extention,
>>                      struct mca_handle_result *result);
>> 
>> };
>> 
>> struct mca_error_handler intel_mca_handler[] >> {
>>     ....
>> };
>> 
>> struct mca_error_handler amd_mca_handler[] >> {
>>     ....
>> };
>> 
>> 
>> /* HandlVer to be called in MCA ISR in MCA context */
>> int intel_mca_pre_handler(struct cpu_user_regs *regs,
>>                                 struct mca_handle_result *result);
>> 
>> int amd_mca_pre_handler(struct cpu_user_regs *regs,
>>                             struct mca_handle_result *result);
>> 
>> Frank.Vanderlinden@Sun.COM <mailto:Frank.Vanderlinden@Sun.COM>
wrote:
>>> Jiang, Yunhong wrote:
>>>> Frank/Christopher, can you please give more comments for it, or
you are
>>>> OK with this? For the action reporting mechanism, we will send
out a
>>>> proposal for review soon.
>>> 
>>> I''m ok with this. We need a little more information on the
AMD
>>> mechanism, but it seems to me that we can fit this in.
>>> 
>>> Sometime this week, I''ll also send out the last of our
changes that
>>> haven''t been sent upstream to xen-unstable yet. Maybe we
can combine
>>> some things in to one patch, like the telemetry handling changes
that
>>> Gavin did. The other changes are error injection (for debugging)
and
>>> panic crash dump support for our FMA tools, but those are probably
only
>>> interesting to us. 
>>> 
>>> - Frank
> 
> 
> 
> --
> ---to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Mar-05 17:28 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Thursday 05 March 2009 16:19:40 Jiang, Yunhong wrote:> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote:
> > MC_ACT_CACHE_SHIRNK  <-- typo. should be MC_ACT_CACHE_SHRINK
>
> Ahh, yes, I will fix it.
>
> > The L3 cache index disable feature works like this:
> >
> > You read the bits 17:6  from the MSR 0xC0000408 (which is MC4_MISC1)
> > and write it into the index field. This MSR does not belong to
> > the standard
> > mc bank data and is therefore provided by mcinfo_extended.
> > The index field are the bits 11:0 of the PCI function 3 register
"L3
> > Cache Index Disable".
>
> So what''s the offset of "L3 Cache Index Disable"? Is it
in 256 byte or 4K
> byte?
Sorry, which offset do you mean ?
>
> For the PCI access, I''d prefer to have xen to control all these,
i.e. even
> if dom0 want to disable the L3 cache, it is done through a hypercall. The
> reason is, Xen control the CPU, so keep it in Xen will make things simpler.
>
> Of course, it is ok for me too, if you want to keep Xen for #MC handler and
> Dom0 for CE handler.
We still need to define the rules to prevent interferes and clarify how to
deal with Dom0/DomU going wild and breaking the rules.
> > Why is the recover action bound to the bank ?
> > I would like to see a struct mcinfo_recover  rather extending
> > struct mcinfo_bank.  That gives us flexibility.
>
> I''d get input from Frank or Gavin. Place mcinfo_recover in
mcinfo_back has
> advantage of keep connection of the error source and the action, but it do
> make the mcinfo_bank more complex. Or we can keep the cpu/bank information
> in the mcinfo_recover also, so that we keep the flexibility and
don''t lose
> the connection.
From your suggestions I prefer the last one, but is still limited due
to the assumption that each struct mcinfo_bank and each struct mcinfo_extended
stands for exactly one error.

This assumption doesn''t cover follow-up errors which may be needed to 
determine the real root cause. Some of them may even be ignored
depending on what is going on.

Christoph
>
> Thanks
> Yunhong Jiang
>
> > Christoph
> >
> > On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:
> >> Christoph/Frank, Followed is the interface definition, please have
a
> >> look.
> >>
> >> Thanks
> >> Yunhong Jiang
> >>
> >> 1) Interface between Xen/dom0 for passing xen''s recovery
action
> >> information to dom0. Usage model: After offlining broken page, Xen
might
> >> pass its page-offline recovery action result information to dom0.
Dom0
> >> will save the information in non-volatile memory for further
proactive
> >> actions, such as offlining the easy-broken page early when doing
next
> >> reboot.
> >>
> >>
> >> struct page_offline_action
> >> {
> >>     /* Params for passing the offlined page number to DOM0 */    
> >> uint64_t mfn; uint64_t status; /* Similar to page offline
hypercall */
> >> };
> >>
> >> struct cpu_offline_action
> >> {
> >>     /* Params for passing the identity of the offlined CPU to DOM0
*/
> >>     uint32_t mc_socketid; uint16_t mc_coreid;
> >>     uint16_t mc_core_threadid;
> >> };
> >>
> >> struct cache_shrink_action
> >> {
> >>     /* TBD, Christoph, please fill it */
> >> };
> >>
> >> /* Recover action flags, giving recovery result information to
guest */
> >> /* Recovery successfully after taking certain recovery actions
below */
> >> #define REC_ACT_RECOVERED      (0x1 << 0)
> >> /* For solaris''s usage that dom0 will take ownership when
crash */
> >> #define REC_ACT_RESET          (0x1 << 2)
> >> /* No action is performed by XEN */
> >> #define REC_ACT_INFO           (0x1 << 3)
> >>
> >> /* Recover action type definition, valid only when flags &
> >> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1
> >> #define MC_ACT_CPU_OFFLINE   2
> >> #define MC_ACT_CACHE_SHIRNK 3
> >>
> >> struct recovery_action
> >> {
> >>     uint8_t flags;
> >>     uint8_t action_type;
> >>     union
> >>     {
> >>         struct page_offline_action page_retire;
> >>         struct cpu_offline_action cpu_offline;
> >>         struct cache_shrink_action cache_shrink;
> >>         uint8_t pad[MAX_ACTION_SIZE];
> >>     } action_info;
> >> }
> >>
> >> struct mcinfo_bank {
> >>     struct mcinfo_common common;
> >>
> >>     uint16_t mc_bank; /* bank nr */
> >>     uint16_t mc_domid; /* Usecase 5: domain referenced by mc_addr
on
> >> dom0 * and if mc_addr is valid. Never valid on DomU. */ uint64_t
> >> mc_status; /* bank status */
> >>     uint64_t mc_addr;   /* bank address, only valid
> >>                          * if addr bit is set in mc_status */    
> >> uint64_t mc_misc; uint64_t mc_ctrl2;
> >>     uint64_t mc_tsc;
> >>     /* Recovery action is performed per bank */
> >>     struct recovery_action action;
> >> };
> >>
> >> 2) Below two interfaces are for MCA processing internal use.
> >>     a. pre_handler will be called earlier in MCA ISR context,
mainly for
> >> early need_reset detection for avoiding log missing (flag
MCA_RESET).
> >> Also, pre_handler might be able to find the impacted domain if
possible.
> >>     b. mca_error_handler is actually a (error_action_index,
> >> recovery_handler pointer) pair. The defined recovery_handler
function
> >> performs the actual recovery operations in softIrq context after
the
> >> per_bank MCA error matching the corresponding mca_code index. If
> >> pre_handler can''t judge the impacted domain,
recovery_handler must
> >> figure it out.
> >>
> >> /* Error has been recovered successfully */
> >> #define MCA_RECOVERD    0
> >> /* Error impact one guest as stated in owner field */ #define
MCA_OWNER
> >> 1 /* Error can''t be recovered and need reboot system */
#define
> >> MCA_RESET 2 /* Error should be handled in softIRQ context */
> >> #define MCA_MORE_ACTION 3
> >>
> >> struct mca_handle_result
> >> {
> >>     uint32_t flags;
> >>     /* Valid only when flags & MCA_OWNER */
> >>     domid_d owner;
> >>     /* valid only when flags & MCA_RECOVERD */
> >>     struct  recovery_action *action;
> >> };
> >>
> >> struct mca_error_handler
> >> {
> >>     /*
> >>      * Assume we will need only architecture defined code. If the
index
> >> can''t be setup by * mca_code, we will add a function to
do the (index,
> >> recovery_handler) mapping check. * This mca_code represents the
recovery
> >> handler pointer index for identifying this * particular
error''s
> >>     corresponding recover action */
> >>     uint16_t mca_code;
> >>
> >>     /* Handler to be called in softIRQ handler context */
> >>     int recovery_handler(struct mcinfo_bank *bank,
> >>                      struct mcinfo_global *global,
> >>                      struct mcinfo_extended *extention,
> >>                      struct mca_handle_result *result);
> >>
> >> };
> >>
> >> struct mca_error_handler intel_mca_handler[] > >> {
> >>     ....
> >> };
> >>
> >> struct mca_error_handler amd_mca_handler[] > >> {
> >>     ....
> >> };
> >>
> >>
> >> /* HandlVer to be called in MCA ISR in MCA context */
> >> int intel_mca_pre_handler(struct cpu_user_regs *regs,
> >>                                 struct mca_handle_result *result);
> >>
> >> int amd_mca_pre_handler(struct cpu_user_regs *regs,
> >>                             struct mca_handle_result *result);
> >>
> >> Frank.Vanderlinden@Sun.COM
<mailto:Frank.Vanderlinden@Sun.COM> wrote:
> >>> Jiang, Yunhong wrote:
> >>>> Frank/Christopher, can you please give more comments for
it, or you
> >>>> are OK with this? For the action reporting mechanism, we
will send out
> >>>> a proposal for review soon.
> >>>
> >>> I''m ok with this. We need a little more information
on the AMD
> >>> mechanism, but it seems to me that we can fit this in.
> >>>
> >>> Sometime this week, I''ll also send out the last of
our changes that
> >>> haven''t been sent upstream to xen-unstable yet. Maybe
we can combine
> >>> some things in to one patch, like the telemetry handling
changes that
> >>> Gavin did. The other changes are error injection (for
debugging) and
> >>> panic crash dump support for our FMA tools, but those are
probably only
> >>> interesting to us.
> >>>
> >>> - Frank
> >
> > --
> > ---to satisfy European Law for business letters:
> > Advanced Micro Devices GmbH
> > Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> > Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
> > Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> > Registergericht Muenchen, HRB Nr. 43632


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Mar-06 02:11 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Christoph Egger <mailto:Christoph.Egger@amd.com>
wrote:> On Thursday 05 March 2009 16:19:40 Jiang, Yunhong wrote:
>> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote:
>>> MC_ACT_CACHE_SHIRNK  <-- typo. should be MC_ACT_CACHE_SHRINK
>> 
>> Ahh, yes, I will fix it.
>> 
>>> The L3 cache index disable feature works like this:
>>> 
>>> You read the bits 17:6  from the MSR 0xC0000408 (which is
MC4_MISC1)
>>> and write it into the index field. This MSR does not belong to
>>> the standard
>>> mc bank data and is therefore provided by mcinfo_extended.
>>> The index field are the bits 11:0 of the PCI function 3 register
"L3
>>> Cache Index Disable".
>> 
>> So what''s the offset of "L3 Cache Index Disable"? Is
it in 256 byte or 4K
>> byte?
> 
> Sorry, which offset do you mean ?
I mean the offset of this register in the PCI function''s configuration
space. You know for a PCI device, it has 256 byte configuration register while
PCI-E device has 4K configuration register.
Currently xen can access the 256 byte config register already, however, to
support 4K range, it requires more stuff, like mmconfig sparse etc.
That''s the reason I ask the offset of this register.
> 
>> 
>> For the PCI access, I''d prefer to have xen to control all
these, i.e. even
>> if dom0 want to disable the L3 cache, it is done through a hypercall.
The
>> reason is, Xen control the CPU, so keep it in Xen will make things
simpler.
>> 
>> Of course, it is ok for me too, if you want to keep Xen for #MC handler
and
>> Dom0 for CE handler.
> 
> We still need to define the rules to prevent interferes and
> clarify how to
> deal with Dom0/DomU going wild and breaking the rules.
As discussed previously,  we don''t need concern about DomU, all
configuration space access from domU will be intercepted by dom0.

For Dom0, since currently all PCI access to 0xcf8/cfc will be intercepted by
Xen,  so Xen can do checking. We can achieve same checking for mmconfig if
remove that range from dom0. But I have to say I''m not sure if we do
need concern too much what will happen when dom0 going wild ( after all, a crash
in dom0 will lost everything), especially interfere on such access will not
cause security issue (please correct me if I''m wrong ).
> 
>>> Why is the recover action bound to the bank ?
>>> I would like to see a struct mcinfo_recover  rather extending
>>> struct mcinfo_bank.  That gives us flexibility.
>> 
>> I''d get input from Frank or Gavin. Place mcinfo_recover in
mcinfo_back has
>> advantage of keep connection of the error source and the action, but it
do
>> make the mcinfo_bank more complex. Or we can keep the cpu/bank
information
>> in the mcinfo_recover also, so that we keep the flexibility and
don''t lose
>> the connection.
> 
> From your suggestions I prefer the last one, but is still limited due
> to the assumption that each struct mcinfo_bank and each struct
> mcinfo_extended stands for exactly one error.
> 
> This assumption doesn''t cover follow-up errors which may be needed
to
> determine the real root cause. Some of them may even be ignored
> depending on what is going on.
I think the assumption here is a recover action will be triggered only by one
bank. For example, we offline page because one MC bank tell us that page is
broken.

The "follow-up errors" is something interesting to me, do you have any
example? It''s ok for us to not include the back information if there
are such requirement.

Thanks
Yunhong Jiang
> 
> Christoph
> 
>> 
>> Thanks
>> Yunhong Jiang
>> 
>>> Christoph
>>> 
>>> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:
>>>> Christoph/Frank, Followed is the interface definition, please
have a
>>>> look. 
>>>> 
>>>> Thanks
>>>> Yunhong Jiang
>>>> 
>>>> 1) Interface between Xen/dom0 for passing xen''s
recovery action
>>>> information to dom0. Usage model: After offlining broken page,
Xen might
>>>> pass its page-offline recovery action result information to
dom0. Dom0
>>>> will save the information in non-volatile memory for further
proactive
>>>> actions, such as offlining the easy-broken page early when
doing next
>>>> reboot. 
>>>> 
>>>> 
>>>> struct page_offline_action
>>>> {
>>>>     /* Params for passing the offlined page number to DOM0 */
>>>> uint64_t mfn; uint64_t status; /* Similar to page offline
hypercall */ };
>>>> 
>>>> struct cpu_offline_action
>>>> {
>>>>     /* Params for passing the identity of the offlined CPU to
DOM0 */
>>>>     uint32_t mc_socketid; uint16_t mc_coreid;
>>>>     uint16_t mc_core_threadid;
>>>> };
>>>> 
>>>> struct cache_shrink_action
>>>> {
>>>>     /* TBD, Christoph, please fill it */
>>>> };
>>>> 
>>>> /* Recover action flags, giving recovery result information to
guest */
>>>> /* Recovery successfully after taking certain recovery actions
below */
>>>> #define REC_ACT_RECOVERED      (0x1 << 0)
>>>> /* For solaris''s usage that dom0 will take ownership
when crash */
>>>> #define REC_ACT_RESET          (0x1 << 2)
>>>> /* No action is performed by XEN */
>>>> #define REC_ACT_INFO           (0x1 << 3)
>>>> 
>>>> /* Recover action type definition, valid only when flags &
>>>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1
>>>> #define MC_ACT_CPU_OFFLINE   2
>>>> #define MC_ACT_CACHE_SHIRNK 3
>>>> 
>>>> struct recovery_action
>>>> {
>>>>     uint8_t flags;
>>>>     uint8_t action_type;
>>>>     union
>>>>     {
>>>>         struct page_offline_action page_retire;
>>>>         struct cpu_offline_action cpu_offline;
>>>>         struct cache_shrink_action cache_shrink;
>>>>         uint8_t pad[MAX_ACTION_SIZE];
>>>>     } action_info;
>>>> }
>>>> 
>>>> struct mcinfo_bank {
>>>>     struct mcinfo_common common;
>>>> 
>>>>     uint16_t mc_bank; /* bank nr */
>>>>     uint16_t mc_domid; /* Usecase 5: domain referenced by
mc_addr on
>>>> dom0 * and if mc_addr is valid. Never valid on DomU. */
uint64_t
>>>>     mc_status; /* bank status */ uint64_t mc_addr;   /* bank
address,
>>>>                          only valid * if addr bit is set in
mc_status */
>>>> uint64_t mc_misc; uint64_t mc_ctrl2;
>>>>     uint64_t mc_tsc;
>>>>     /* Recovery action is performed per bank */
>>>>     struct recovery_action action;
>>>> };
>>>> 
>>>> 2) Below two interfaces are for MCA processing internal use.
>>>>     a. pre_handler will be called earlier in MCA ISR context,
mainly for
>>>> early need_reset detection for avoiding log missing (flag
MCA_RESET).
>>>> Also, pre_handler might be able to find the impacted domain if
possible.
>>>>     b. mca_error_handler is actually a (error_action_index,
>>>> recovery_handler pointer) pair. The defined recovery_handler
function
>>>> performs the actual recovery operations in softIrq context
after the
>>>> per_bank MCA error matching the corresponding mca_code index.
If
>>>> pre_handler can''t judge the impacted domain,
recovery_handler must
>>>> figure it out. 
>>>> 
>>>> /* Error has been recovered successfully */
>>>> #define MCA_RECOVERD    0
>>>> /* Error impact one guest as stated in owner field */ #define
MCA_OWNER
>>>> 1 /* Error can''t be recovered and need reboot system
*/ #define
>>>> MCA_RESET 2 /* Error should be handled in softIRQ context */
#define
>>>> MCA_MORE_ACTION 3 
>>>> 
>>>> struct mca_handle_result
>>>> {
>>>>     uint32_t flags;
>>>>     /* Valid only when flags & MCA_OWNER */
>>>>     domid_d owner;
>>>>     /* valid only when flags & MCA_RECOVERD */
>>>>     struct  recovery_action *action;
>>>> };
>>>> 
>>>> struct mca_error_handler
>>>> {
>>>>     /*
>>>>      * Assume we will need only architecture defined code. If
the index
>>>> can''t be setup by * mca_code, we will add a function
to do the (index,
>>>> recovery_handler) mapping check. * This mca_code represents the
recovery
>>>> handler pointer index for identifying this * particular
error''s
>>>>     corresponding recover action */
>>>>     uint16_t mca_code;
>>>> 
>>>>     /* Handler to be called in softIRQ handler context */
>>>>     int recovery_handler(struct mcinfo_bank *bank,
>>>>                      struct mcinfo_global *global,
>>>>                      struct mcinfo_extended *extention,
>>>>                      struct mca_handle_result *result);
>>>> 
>>>> };
>>>> 
>>>> struct mca_error_handler intel_mca_handler[] >>>> {
>>>>     ....
>>>> };
>>>> 
>>>> struct mca_error_handler amd_mca_handler[] >>>> {
>>>>     ....
>>>> };
>>>> 
>>>> 
>>>> /* HandlVer to be called in MCA ISR in MCA context */
>>>> int intel_mca_pre_handler(struct cpu_user_regs *regs,
>>>>                                 struct mca_handle_result
*result);
>>>> 
>>>> int amd_mca_pre_handler(struct cpu_user_regs *regs,
>>>>                             struct mca_handle_result *result);
>>>> 
>>>> Frank.Vanderlinden@Sun.COM
> <mailto:Frank.Vanderlinden@Sun.COM> wrote:
>>>>> Jiang, Yunhong wrote:
>>>>>> Frank/Christopher, can you please give more comments
for it, or you
>>>>>> are OK with this? For the action reporting mechanism,
we will send out
>>>>>> a proposal for review soon.
>>>>> 
>>>>> I''m ok with this. We need a little more
information on the AMD
>>>>> mechanism, but it seems to me that we can fit this in.
>>>>> 
>>>>> Sometime this week, I''ll also send out the last of
our changes that
>>>>> haven''t been sent upstream to xen-unstable yet.
Maybe we can combine
>>>>> some things in to one patch, like the telemetry handling
changes that
>>>>> Gavin did. The other changes are error injection (for
debugging) and
>>>>> panic crash dump support for our FMA tools, but those are
probably only
>>>>> interesting to us. 
>>>>> 
>>>>> - Frank
>>> 
>>> --
>>> ---to satisfy European Law for business letters:
>>> Advanced Micro Devices GmbH
>>> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
>>> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
>>> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
>>> Registergericht Muenchen, HRB Nr. 43632
> 
> 
> 
> --
> ---to satisfy European Law for business letters:
> Advanced Micro Devices GmbH
> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
> Registergericht Muenchen, HRB Nr. 43632_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Mar-10 01:19 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Christoph/Frank, do you have any comments?

Thanks
Yunhong Jiang

Jiang, Yunhong <> wrote:> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote:
>> On Thursday 05 March 2009 16:19:40 Jiang, Yunhong wrote:
>>> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote:
>>>> MC_ACT_CACHE_SHIRNK  <-- typo. should be MC_ACT_CACHE_SHRINK
>>> 
>>> Ahh, yes, I will fix it.
>>> 
>>>> The L3 cache index disable feature works like this:
>>>> 
>>>> You read the bits 17:6  from the MSR 0xC0000408 (which is
MC4_MISC1)
>>>> and write it into the index field. This MSR does not belong to
>>>> the standard
>>>> mc bank data and is therefore provided by mcinfo_extended.
>>>> The index field are the bits 11:0 of the PCI function 3
register "L3
>>>> Cache Index Disable".
>>> 
>>> So what''s the offset of "L3 Cache Index
Disable"? Is it in 256 byte or 4K
>>> byte?
>> 
>> Sorry, which offset do you mean ?
> 
> I mean the offset of this register in the PCI function''s
> configuration space. You know for a PCI device, it has 256
> byte configuration register while PCI-E device has 4K
> configuration register.
> Currently xen can access the 256 byte config register already,
> however, to support 4K range, it requires more stuff, like
> mmconfig sparse etc. That''s the reason I ask the offset of
> this register.
> 
>> 
>>> 
>>> For the PCI access, I''d prefer to have xen to control all
these, i.e. even
>>> if dom0 want to disable the L3 cache, it is done through a
hypercall. The
>>> reason is, Xen control the CPU, so keep it in Xen will make things
>>> simpler. 
>>> 
>>> Of course, it is ok for me too, if you want to keep Xen for #MC
handler
>>> and Dom0 for CE handler.
>> 
>> We still need to define the rules to prevent interferes and
>> clarify how to
>> deal with Dom0/DomU going wild and breaking the rules.
> 
> As discussed previously,  we don''t need concern about DomU,
> all configuration space access from domU will be intercepted by dom0.
> 
> For Dom0, since currently all PCI access to 0xcf8/cfc will be
> intercepted by Xen,  so Xen can do checking. We can achieve
> same checking for mmconfig if remove that range from dom0. But
> I have to say I''m not sure if we do need concern too much what
> will happen when dom0 going wild ( after all, a crash in dom0
> will lost everything), especially interfere on such access
> will not cause security issue (please correct me if I''m wrong ).
> 
>> 
>>>> Why is the recover action bound to the bank ?
>>>> I would like to see a struct mcinfo_recover  rather extending
>>>> struct mcinfo_bank.  That gives us flexibility.
>>> 
>>> I''d get input from Frank or Gavin. Place mcinfo_recover in
mcinfo_back has
>>> advantage of keep connection of the error source and the action,
but it do
>>> make the mcinfo_bank more complex. Or we can keep the cpu/bank
information
>>> in the mcinfo_recover also, so that we keep the flexibility and
don''t lose
>>> the connection.
>> 
>> From your suggestions I prefer the last one, but is still limited due
>> to the assumption that each struct mcinfo_bank and each struct
>> mcinfo_extended stands for exactly one error.
>> 
>> This assumption doesn''t cover follow-up errors which may be
needed to
>> determine the real root cause. Some of them may even be ignored
>> depending on what is going on.
> 
> I think the assumption here is a recover action will be
> triggered only by one bank. For example, we offline page
> because one MC bank tell us that page is broken.
> 
> The "follow-up errors" is something interesting to me, do you
> have any example? It''s ok for us to not include the back
> information if there are such requirement.
> 
> Thanks
> Yunhong Jiang
> 
>> 
>> Christoph
>> 
>>> 
>>> Thanks
>>> Yunhong Jiang
>>> 
>>>> Christoph
>>>> 
>>>> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:
>>>>> Christoph/Frank, Followed is the interface definition,
please have a
>>>>> look. 
>>>>> 
>>>>> Thanks
>>>>> Yunhong Jiang
>>>>> 
>>>>> 1) Interface between Xen/dom0 for passing xen''s
recovery action
>>>>> information to dom0. Usage model: After offlining broken
page, Xen might
>>>>> pass its page-offline recovery action result information to
dom0. Dom0
>>>>> will save the information in non-volatile memory for
further proactive
>>>>> actions, such as offlining the easy-broken page early when
doing next
>>>>> reboot. 
>>>>> 
>>>>> 
>>>>> struct page_offline_action
>>>>> {
>>>>>     /* Params for passing the offlined page number to DOM0
*/
>>>>> uint64_t mfn; uint64_t status; /* Similar to page offline
hypercall */
>>>>> }; 
>>>>> 
>>>>> struct cpu_offline_action
>>>>> {
>>>>>     /* Params for passing the identity of the offlined CPU
to DOM0 */
>>>>>     uint32_t mc_socketid; uint16_t mc_coreid;
>>>>>     uint16_t mc_core_threadid;
>>>>> };
>>>>> 
>>>>> struct cache_shrink_action
>>>>> {
>>>>>     /* TBD, Christoph, please fill it */
>>>>> };
>>>>> 
>>>>> /* Recover action flags, giving recovery result information
to guest */
>>>>> /* Recovery successfully after taking certain recovery
actions below */
>>>>> #define REC_ACT_RECOVERED      (0x1 << 0)
>>>>> /* For solaris''s usage that dom0 will take
ownership when crash */
>>>>> #define REC_ACT_RESET          (0x1 << 2)
>>>>> /* No action is performed by XEN */
>>>>> #define REC_ACT_INFO           (0x1 << 3)
>>>>> 
>>>>> /* Recover action type definition, valid only when flags
&
>>>>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1
>>>>> #define MC_ACT_CPU_OFFLINE   2
>>>>> #define MC_ACT_CACHE_SHIRNK 3
>>>>> 
>>>>> struct recovery_action
>>>>> {
>>>>>     uint8_t flags;
>>>>>     uint8_t action_type;
>>>>>     union
>>>>>     {
>>>>>         struct page_offline_action page_retire;
>>>>>         struct cpu_offline_action cpu_offline;
>>>>>         struct cache_shrink_action cache_shrink;
>>>>>         uint8_t pad[MAX_ACTION_SIZE];
>>>>>     } action_info;
>>>>> }
>>>>> 
>>>>> struct mcinfo_bank {
>>>>>     struct mcinfo_common common;
>>>>> 
>>>>>     uint16_t mc_bank; /* bank nr */
>>>>>     uint16_t mc_domid; /* Usecase 5: domain referenced by
mc_addr on
>>>>> dom0 * and if mc_addr is valid. Never valid on DomU. */
uint64_t
>>>>>     mc_status; /* bank status */ uint64_t mc_addr;   /*
bank address,
>>>>>                          only valid * if addr bit is set in
mc_status */
>>>>> uint64_t mc_misc; uint64_t mc_ctrl2;
>>>>>     uint64_t mc_tsc;
>>>>>     /* Recovery action is performed per bank */
>>>>>     struct recovery_action action;
>>>>> };
>>>>> 
>>>>> 2) Below two interfaces are for MCA processing internal
use.
>>>>>     a. pre_handler will be called earlier in MCA ISR
context, mainly for
>>>>> early need_reset detection for avoiding log missing (flag
MCA_RESET).
>>>>> Also, pre_handler might be able to find the impacted domain
if possible.
>>>>>     b. mca_error_handler is actually a (error_action_index,
>>>>> recovery_handler pointer) pair. The defined
recovery_handler function
>>>>> performs the actual recovery operations in softIrq context
after the
>>>>> per_bank MCA error matching the corresponding mca_code
index. If
>>>>> pre_handler can''t judge the impacted domain,
recovery_handler must
>>>>> figure it out. 
>>>>> 
>>>>> /* Error has been recovered successfully */
>>>>> #define MCA_RECOVERD    0
>>>>> /* Error impact one guest as stated in owner field */
#define MCA_OWNER
>>>>> 1 /* Error can''t be recovered and need reboot
system */ #define
>>>>> MCA_RESET 2 /* Error should be handled in softIRQ context
*/ #define
>>>>> MCA_MORE_ACTION 3 
>>>>> 
>>>>> struct mca_handle_result
>>>>> {
>>>>>     uint32_t flags;
>>>>>     /* Valid only when flags & MCA_OWNER */
>>>>>     domid_d owner;
>>>>>     /* valid only when flags & MCA_RECOVERD */
>>>>>     struct  recovery_action *action;
>>>>> };
>>>>> 
>>>>> struct mca_error_handler
>>>>> {
>>>>>     /*
>>>>>      * Assume we will need only architecture defined code.
If the index
>>>>> can''t be setup by * mca_code, we will add a
function to do the (index,
>>>>> recovery_handler) mapping check. * This mca_code represents
the recovery
>>>>> handler pointer index for identifying this * particular
error''s
>>>>>     corresponding recover action */
>>>>>     uint16_t mca_code;
>>>>> 
>>>>>     /* Handler to be called in softIRQ handler context */
>>>>>     int recovery_handler(struct mcinfo_bank *bank,
>>>>>                      struct mcinfo_global *global,
>>>>>                      struct mcinfo_extended *extention,
>>>>>                      struct mca_handle_result *result);
>>>>> 
>>>>> };
>>>>> 
>>>>> struct mca_error_handler intel_mca_handler[]
>>>>> {
>>>>>     ....
>>>>> };
>>>>> 
>>>>> struct mca_error_handler amd_mca_handler[]
>>>>> {
>>>>>     ....
>>>>> };
>>>>> 
>>>>> 
>>>>> /* HandlVer to be called in MCA ISR in MCA context */
>>>>> int intel_mca_pre_handler(struct cpu_user_regs *regs,
>>>>>                                 struct mca_handle_result
*result);
>>>>> 
>>>>> int amd_mca_pre_handler(struct cpu_user_regs *regs,
>>>>>                             struct mca_handle_result
*result);
>>>>> 
>>>>> Frank.Vanderlinden@Sun.COM
>> <mailto:Frank.Vanderlinden@Sun.COM> wrote:
>>>>>> Jiang, Yunhong wrote:
>>>>>>> Frank/Christopher, can you please give more
comments for it, or you
>>>>>>> are OK with this? For the action reporting
mechanism, we will send out
>>>>>>> a proposal for review soon.
>>>>>> 
>>>>>> I''m ok with this. We need a little more
information on the AMD
>>>>>> mechanism, but it seems to me that we can fit this in.
>>>>>> 
>>>>>> Sometime this week, I''ll also send out the
last of our changes that
>>>>>> haven''t been sent upstream to xen-unstable
yet. Maybe we can combine
>>>>>> some things in to one patch, like the telemetry
handling changes that
>>>>>> Gavin did. The other changes are error injection (for
debugging) and
>>>>>> panic crash dump support for our FMA tools, but those
are probably
>>>>>> only interesting to us. 
>>>>>> 
>>>>>> - Frank
>>>> 
>>>> --
>>>> ---to satisfy European Law for business letters:
>>>> Advanced Micro Devices GmbH
>>>> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
>>>> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano
Meroni
>>>> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
>>>> Registergericht Muenchen, HRB Nr. 43632
>> 
>> 
>> 
>> --
>> ---to satisfy European Law for business letters:
>> Advanced Micro Devices GmbH
>> Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
>> Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
>> Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
>> Registergericht Muenchen, HRB Nr. 43632_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2009-Mar-10 19:08 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

On Tuesday 10 March 2009 02:19:04 Jiang, Yunhong wrote:> Christoph/Frank, do you have any comments?
>
> Thanks
> Yunhong Jiang
>
> Jiang, Yunhong <> wrote:
> > Christoph Egger <mailto:Christoph.Egger@amd.com> wrote:
> >> On Thursday 05 March 2009 16:19:40 Jiang, Yunhong wrote:
> >>> Christoph Egger <mailto:Christoph.Egger@amd.com> wrote:
> >>>> MC_ACT_CACHE_SHIRNK  <-- typo. should be
MC_ACT_CACHE_SHRINK
> >>>
> >>> Ahh, yes, I will fix it.
> >>>
> >>>> The L3 cache index disable feature works like this:
> >>>>
> >>>> You read the bits 17:6  from the MSR 0xC0000408 (which is
MC4_MISC1)
> >>>> and write it into the index field. This MSR does not
belong to
> >>>> the standard
> >>>> mc bank data and is therefore provided by mcinfo_extended.
> >>>> The index field are the bits 11:0 of the PCI function 3
register "L3
> >>>> Cache Index Disable".
> >>>
> >>> So what''s the offset of "L3 Cache Index
Disable"? Is it in 256 byte or
> >>> 4K byte?
> >>
> >> Sorry, which offset do you mean ?
> >
> > I mean the offset of this register in the PCI function''s
> > configuration space. You know for a PCI device, it has 256
> > byte configuration register while PCI-E device has 4K
> > configuration register.
> > Currently xen can access the 256 byte config register already,
> > however, to support 4K range, it requires more stuff, like
> > mmconfig sparse etc. That''s the reason I ask the offset of
> > this register.
Ah, I see. The registers of our memory controller are in the
PCI config space. It''s no PCI-E device.
> >>> For the PCI access, I''d prefer to have xen to control
all these, i.e.
> >>> even if dom0 want to disable the L3 cache, it is done through
a
> >>> hypercall. The reason is, Xen control the CPU, so keep it in
Xen will
> >>> make things simpler.
> >>>
> >>> Of course, it is ok for me too, if you want to keep Xen for
#MC handler
> >>> and Dom0 for CE handler.
> >>
> >> We still need to define the rules to prevent interferes and
> >> clarify how to
> >> deal with Dom0/DomU going wild and breaking the rules.
> >
> > As discussed previously,  we don''t need concern about DomU,
> > all configuration space access from domU will be intercepted by dom0.
> >
> > For Dom0, since currently all PCI access to 0xcf8/cfc will be
> > intercepted by Xen,  so Xen can do checking. We can achieve
> > same checking for mmconfig if remove that range from dom0. But
> > I have to say I''m not sure if we do need concern too much
what
> > will happen when dom0 going wild ( after all, a crash in dom0
> > will lost everything), especially interfere on such access
> > will not cause security issue (please correct me if I''m wrong
).
This sounds like an assumption that an IOMMU is always available.

> >>>> Why is the recover action bound to the bank ?
> >>>> I would like to see a struct mcinfo_recover  rather
extending
> >>>> struct mcinfo_bank.  That gives us flexibility.
> >>>
> >>> I''d get input from Frank or Gavin. Place
mcinfo_recover in mcinfo_back
> >>> has advantage of keep connection of the error source and the
action,
> >>> but it do make the mcinfo_bank more complex. Or we can keep
the
> >>> cpu/bank information in the mcinfo_recover also, so that we
keep the
> >>> flexibility and don''t lose the connection.
> >>
> >> From your suggestions I prefer the last one, but is still limited
due
> >> to the assumption that each struct mcinfo_bank and each struct
> >> mcinfo_extended stands for exactly one error.
> >>
> >> This assumption doesn''t cover follow-up errors which may
be needed to
> >> determine the real root cause. Some of them may even be ignored
> >> depending on what is going on.
> >
> > I think the assumption here is a recover action will be
> > triggered only by one bank. For example, we offline page
> > because one MC bank tell us that page is broken.
Only if the bank is the one from the memory controller.
What if the bank is the Data or Instruction Cache ?
> > The "follow-up errors" is something interesting to me, do
you
> > have any example? It''s ok for us to not include the back
> > information if there are such requirement.
An error in the Bus Unit can trigger a watchdog timeout
and cause a Load-Store error as a "follow-up error". This in turn
may trigger another "follow-up error" in the memory controller
or in the Data or Instruction Cache depending on what the CPU
tries to do.

I think, we should mark the ''struct mcinfo_global'' as a kind
of header for
each error. All following information describe the error (including the 
follow-up errors) and all recover actions. This gives us the flexibility
to get as many information as possible and allows to do
as many recover actions as necessary instead of just one.

Christoph

> >>>> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong wrote:
> >>>>> Christoph/Frank, Followed is the interface definition,
please have a
> >>>>> look.
> >>>>>
> >>>>> Thanks
> >>>>> Yunhong Jiang
> >>>>>
> >>>>> 1) Interface between Xen/dom0 for passing
xen''s recovery action
> >>>>> information to dom0. Usage model: After offlining
broken page, Xen
> >>>>> might pass its page-offline recovery action result
information to
> >>>>> dom0. Dom0 will save the information in non-volatile
memory for
> >>>>> further proactive actions, such as offlining the
easy-broken page
> >>>>> early when doing next reboot.
> >>>>>
> >>>>>
> >>>>> struct page_offline_action
> >>>>> {
> >>>>>     /* Params for passing the offlined page number to
DOM0 */
> >>>>> uint64_t mfn; uint64_t status; /* Similar to page
offline hypercall
> >>>>> */ };
> >>>>>
> >>>>> struct cpu_offline_action
> >>>>> {
> >>>>>     /* Params for passing the identity of the offlined
CPU to DOM0 */
> >>>>>     uint32_t mc_socketid; uint16_t mc_coreid;
> >>>>>     uint16_t mc_core_threadid;
> >>>>> };
> >>>>>
> >>>>> struct cache_shrink_action
> >>>>> {
> >>>>>     /* TBD, Christoph, please fill it */
> >>>>> };
> >>>>>
> >>>>> /* Recover action flags, giving recovery result
information to guest
> >>>>> */ /* Recovery successfully after taking certain
recovery actions
> >>>>> below */ #define REC_ACT_RECOVERED      (0x1 <<
0)
> >>>>> /* For solaris''s usage that dom0 will take
ownership when crash */
> >>>>> #define REC_ACT_RESET          (0x1 << 2)
> >>>>> /* No action is performed by XEN */
> >>>>> #define REC_ACT_INFO           (0x1 << 3)
> >>>>>
> >>>>> /* Recover action type definition, valid only when
flags &
> >>>>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1
> >>>>> #define MC_ACT_CPU_OFFLINE   2
> >>>>> #define MC_ACT_CACHE_SHIRNK 3
> >>>>>
> >>>>> struct recovery_action
> >>>>> {
> >>>>>     uint8_t flags;
> >>>>>     uint8_t action_type;
> >>>>>     union
> >>>>>     {
> >>>>>         struct page_offline_action page_retire;
> >>>>>         struct cpu_offline_action cpu_offline;
> >>>>>         struct cache_shrink_action cache_shrink;
> >>>>>         uint8_t pad[MAX_ACTION_SIZE];
> >>>>>     } action_info;
> >>>>> }
> >>>>>
> >>>>> struct mcinfo_bank {
> >>>>>     struct mcinfo_common common;
> >>>>>
> >>>>>     uint16_t mc_bank; /* bank nr */
> >>>>>     uint16_t mc_domid; /* Usecase 5: domain referenced
by mc_addr on
> >>>>> dom0 * and if mc_addr is valid. Never valid on DomU.
*/ uint64_t
> >>>>>     mc_status; /* bank status */ uint64_t mc_addr;  
/* bank address,
> >>>>>                          only valid * if addr bit is
set in mc_status
> >>>>> */ uint64_t mc_misc; uint64_t mc_ctrl2;
> >>>>>     uint64_t mc_tsc;
> >>>>>     /* Recovery action is performed per bank */
> >>>>>     struct recovery_action action;
> >>>>> };
> >>>>>
> >>>>> 2) Below two interfaces are for MCA processing
internal use.
> >>>>>     a. pre_handler will be called earlier in MCA ISR
context, mainly
> >>>>> for early need_reset detection for avoiding log
missing (flag
> >>>>> MCA_RESET). Also, pre_handler might be able to find
the impacted
> >>>>> domain if possible. b. mca_error_handler is actually a
> >>>>> (error_action_index,
> >>>>> recovery_handler pointer) pair. The defined
recovery_handler function
> >>>>> performs the actual recovery operations in softIrq
context after the
> >>>>> per_bank MCA error matching the corresponding mca_code
index. If
> >>>>> pre_handler can''t judge the impacted domain,
recovery_handler must
> >>>>> figure it out.
> >>>>>
> >>>>> /* Error has been recovered successfully */
> >>>>> #define MCA_RECOVERD    0
> >>>>> /* Error impact one guest as stated in owner field */
#define
> >>>>> MCA_OWNER 1 /* Error can''t be recovered and
need reboot system */
> >>>>> #define MCA_RESET 2 /* Error should be handled in
softIRQ context */
> >>>>> #define MCA_MORE_ACTION 3
> >>>>>
> >>>>> struct mca_handle_result
> >>>>> {
> >>>>>     uint32_t flags;
> >>>>>     /* Valid only when flags & MCA_OWNER */
> >>>>>     domid_d owner;
> >>>>>     /* valid only when flags & MCA_RECOVERD */
> >>>>>     struct  recovery_action *action;
> >>>>> };
> >>>>>
> >>>>> struct mca_error_handler
> >>>>> {
> >>>>>     /*
> >>>>>      * Assume we will need only architecture defined
code. If the
> >>>>> index can''t be setup by * mca_code, we will
add a function to do the
> >>>>> (index, recovery_handler) mapping check. * This
mca_code represents
> >>>>> the recovery handler pointer index for identifying
this * particular
> >>>>> error''s corresponding recover action */
> >>>>>     uint16_t mca_code;
> >>>>>
> >>>>>     /* Handler to be called in softIRQ handler context
*/
> >>>>>     int recovery_handler(struct mcinfo_bank *bank,
> >>>>>                      struct mcinfo_global *global,
> >>>>>                      struct mcinfo_extended
*extention,
> >>>>>                      struct mca_handle_result
*result);
> >>>>>
> >>>>> };
> >>>>>
> >>>>> struct mca_error_handler intel_mca_handler[] >
>>>>> {
> >>>>>     ....
> >>>>> };
> >>>>>
> >>>>> struct mca_error_handler amd_mca_handler[] >
>>>>> {
> >>>>>     ....
> >>>>> };
> >>>>>
> >>>>>
> >>>>> /* HandlVer to be called in MCA ISR in MCA context */
> >>>>> int intel_mca_pre_handler(struct cpu_user_regs *regs,
> >>>>>                                 struct
mca_handle_result *result);
> >>>>>
> >>>>> int amd_mca_pre_handler(struct cpu_user_regs *regs,
> >>>>>                             struct mca_handle_result
*result);
> >>>>>
> >>>>> Frank.Vanderlinden@Sun.COM
> >>
> >> <mailto:Frank.Vanderlinden@Sun.COM> wrote:
> >>>>>> Jiang, Yunhong wrote:
> >>>>>>> Frank/Christopher, can you please give more
comments for it, or you
> >>>>>>> are OK with this? For the action reporting
mechanism, we will send
> >>>>>>> out a proposal for review soon.
> >>>>>>
> >>>>>> I''m ok with this. We need a little more
information on the AMD
> >>>>>> mechanism, but it seems to me that we can fit this
in.
> >>>>>>
> >>>>>> Sometime this week, I''ll also send out
the last of our changes that
> >>>>>> haven''t been sent upstream to
xen-unstable yet. Maybe we can combine
> >>>>>> some things in to one patch, like the telemetry
handling changes
> >>>>>> that Gavin did. The other changes are error
injection (for
> >>>>>> debugging) and panic crash dump support for our
FMA tools, but those
> >>>>>> are probably only interesting to us.
> >>>>>>
> >>>>>> - Frank
> >>>>
> >>>> --

-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2009-Mar-12 15:52 UTC

head link

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Christoph, sorry for later response. Please see inline reply.
>Ah, I see. The registers of our memory controller are in the
>PCI config space. It''s no PCI-E device.
That''s great.
>
>> >>> For the PCI access, I''d prefer to have xen to
control
>all these, i.e.
>> >>> even if dom0 want to disable the L3 cache, it is done
through a
>> >>> hypercall. The reason is, Xen control the CPU, so keep 
>it in Xen will
>> >>> make things simpler.
>> >>>
>> >>> Of course, it is ok for me too, if you want to keep Xen 
>for #MC handler
>> >>> and Dom0 for CE handler.
>> >>
>> >> We still need to define the rules to prevent interferes and
>> >> clarify how to
>> >> deal with Dom0/DomU going wild and breaking the rules.
>> >
>> > As discussed previously,  we don''t need concern about
DomU,
>> > all configuration space access from domU will be 
>intercepted by dom0.
>> >
>> > For Dom0, since currently all PCI access to 0xcf8/cfc will be
>> > intercepted by Xen,  so Xen can do checking. We can achieve
>> > same checking for mmconfig if remove that range from dom0. But
>> > I have to say I''m not sure if we do need concern too much
what
>> > will happen when dom0 going wild ( after all, a crash in dom0
>> > will lost everything), especially interfere on such access
>> > will not cause security issue (please correct me if I''m
wrong ).
>
>This sounds like an assumption that an IOMMU is always available.
Xen''s PCI access does not requires IOMMU, it is in arch/x86/pci.c .
>> > I think the assumption here is a recover action will be
>> > triggered only by one bank. For example, we offline page
>> > because one MC bank tell us that page is broken.
>
>Only if the bank is the one from the memory controller.
>What if the bank is the Data or Instruction Cache ?
>
>> > The "follow-up errors" is something interesting to me,
do you
>> > have any example? It''s ok for us to not include the back
>> > information if there are such requirement.
>
>An error in the Bus Unit can trigger a watchdog timeout
>and cause a Load-Store error as a "follow-up error". This in turn
>may trigger another "follow-up error" in the memory controller
>or in the Data or Instruction Cache depending on what the CPU
>tries to do.
Hmm, so will these follow-up error in the same bank or different bank? If in
different bank, how can MCE handler knows they are related, or even should MCE
handler knows about the relationship (I didn''t find such code in
current implementation). Or you mean we need give the relationship because Dom0
need such information?
>
>I think, we should mark the ''struct mcinfo_global'' as a
kind
>of header for
>each error. All following information describe the error 
>(including the 
>follow-up errors) and all recover actions. This gives us the 
>flexibility
>to get as many information as possible and allows to do
>as many recover actions as necessary instead of just one.
I think your original proposal can also meet such purpose, i.e. include the
mc_recover_info and we still need pass all mc_bacnk infor to dom0 for telemetry.
If you prefer this one, can you please define the interface? Gavin/Frank, do you
have any idea for this changes?

Thanks
-- Yunhong Jiang
>
>Christoph
>
>
>> >>>> On Thursday 05 March 2009 09:31:27 Jiang, Yunhong
wrote:
>> >>>>> Christoph/Frank, Followed is the interface
definition,
>please have a
>> >>>>> look.
>> >>>>>
>> >>>>> Thanks
>> >>>>> Yunhong Jiang
>> >>>>>
>> >>>>> 1) Interface between Xen/dom0 for passing
xen''s recovery action
>> >>>>> information to dom0. Usage model: After offlining 
>broken page, Xen
>> >>>>> might pass its page-offline recovery action result
>information to
>> >>>>> dom0. Dom0 will save the information in
non-volatile memory for
>> >>>>> further proactive actions, such as offlining the 
>easy-broken page
>> >>>>> early when doing next reboot.
>> >>>>>
>> >>>>>
>> >>>>> struct page_offline_action
>> >>>>> {
>> >>>>>     /* Params for passing the offlined page number
to DOM0 */
>> >>>>> uint64_t mfn; uint64_t status; /* Similar to page 
>offline hypercall
>> >>>>> */ };
>> >>>>>
>> >>>>> struct cpu_offline_action
>> >>>>> {
>> >>>>>     /* Params for passing the identity of the
offlined
>CPU to DOM0 */
>> >>>>>     uint32_t mc_socketid; uint16_t mc_coreid;
>> >>>>>     uint16_t mc_core_threadid;
>> >>>>> };
>> >>>>>
>> >>>>> struct cache_shrink_action
>> >>>>> {
>> >>>>>     /* TBD, Christoph, please fill it */
>> >>>>> };
>> >>>>>
>> >>>>> /* Recover action flags, giving recovery result 
>information to guest
>> >>>>> */ /* Recovery successfully after taking certain 
>recovery actions
>> >>>>> below */ #define REC_ACT_RECOVERED      (0x1
<< 0)
>> >>>>> /* For solaris''s usage that dom0 will
take ownership
>when crash */
>> >>>>> #define REC_ACT_RESET          (0x1 << 2)
>> >>>>> /* No action is performed by XEN */
>> >>>>> #define REC_ACT_INFO           (0x1 << 3)
>> >>>>>
>> >>>>> /* Recover action type definition, valid only when
flags &
>> >>>>> REC_ACT_RECOVERED */ #define MC_ACT_PAGE_OFFLINE 1
>> >>>>> #define MC_ACT_CPU_OFFLINE   2
>> >>>>> #define MC_ACT_CACHE_SHIRNK 3
>> >>>>>
>> >>>>> struct recovery_action
>> >>>>> {
>> >>>>>     uint8_t flags;
>> >>>>>     uint8_t action_type;
>> >>>>>     union
>> >>>>>     {
>> >>>>>         struct page_offline_action page_retire;
>> >>>>>         struct cpu_offline_action cpu_offline;
>> >>>>>         struct cache_shrink_action cache_shrink;
>> >>>>>         uint8_t pad[MAX_ACTION_SIZE];
>> >>>>>     } action_info;
>> >>>>> }
>> >>>>>
>> >>>>> struct mcinfo_bank {
>> >>>>>     struct mcinfo_common common;
>> >>>>>
>> >>>>>     uint16_t mc_bank; /* bank nr */
>> >>>>>     uint16_t mc_domid; /* Usecase 5: domain
referenced
>by mc_addr on
>> >>>>> dom0 * and if mc_addr is valid. Never valid on
DomU.
>*/ uint64_t
>> >>>>>     mc_status; /* bank status */ uint64_t mc_addr;
>/* bank address,
>> >>>>>                          only valid * if addr bit
is
>set in mc_status
>> >>>>> */ uint64_t mc_misc; uint64_t mc_ctrl2;
>> >>>>>     uint64_t mc_tsc;
>> >>>>>     /* Recovery action is performed per bank */
>> >>>>>     struct recovery_action action;
>> >>>>> };
>> >>>>>
>> >>>>> 2) Below two interfaces are for MCA processing
internal use.
>> >>>>>     a. pre_handler will be called earlier in MCA
ISR
>context, mainly
>> >>>>> for early need_reset detection for avoiding log
missing (flag
>> >>>>> MCA_RESET). Also, pre_handler might be able to
find
>the impacted
>> >>>>> domain if possible. b. mca_error_handler is
actually a
>> >>>>> (error_action_index,
>> >>>>> recovery_handler pointer) pair. The defined 
>recovery_handler function
>> >>>>> performs the actual recovery operations in softIrq
>context after the
>> >>>>> per_bank MCA error matching the corresponding
mca_code
>index. If
>> >>>>> pre_handler can''t judge the impacted
domain,
>recovery_handler must
>> >>>>> figure it out.
>> >>>>>
>> >>>>> /* Error has been recovered successfully */
>> >>>>> #define MCA_RECOVERD    0
>> >>>>> /* Error impact one guest as stated in owner field
*/ #define
>> >>>>> MCA_OWNER 1 /* Error can''t be recovered
and need
>reboot system */
>> >>>>> #define MCA_RESET 2 /* Error should be handled in 
>softIRQ context */
>> >>>>> #define MCA_MORE_ACTION 3
>> >>>>>
>> >>>>> struct mca_handle_result
>> >>>>> {
>> >>>>>     uint32_t flags;
>> >>>>>     /* Valid only when flags & MCA_OWNER */
>> >>>>>     domid_d owner;
>> >>>>>     /* valid only when flags & MCA_RECOVERD */
>> >>>>>     struct  recovery_action *action;
>> >>>>> };
>> >>>>>
>> >>>>> struct mca_error_handler
>> >>>>> {
>> >>>>>     /*
>> >>>>>      * Assume we will need only architecture
defined
>code. If the
>> >>>>> index can''t be setup by * mca_code, we
will add a
>function to do the
>> >>>>> (index, recovery_handler) mapping check. * This 
>mca_code represents
>> >>>>> the recovery handler pointer index for identifying
>this * particular
>> >>>>> error''s corresponding recover action */
>> >>>>>     uint16_t mca_code;
>> >>>>>
>> >>>>>     /* Handler to be called in softIRQ handler
context */
>> >>>>>     int recovery_handler(struct mcinfo_bank *bank,
>> >>>>>                      struct mcinfo_global *global,
>> >>>>>                      struct mcinfo_extended
*extention,
>> >>>>>                      struct mca_handle_result
*result);
>> >>>>>
>> >>>>> };
>> >>>>>
>> >>>>> struct mca_error_handler intel_mca_handler[]
>> >>>>> {
>> >>>>>     ....
>> >>>>> };
>> >>>>>
>> >>>>> struct mca_error_handler amd_mca_handler[]
>> >>>>> {
>> >>>>>     ....
>> >>>>> };
>> >>>>>
>> >>>>>
>> >>>>> /* HandlVer to be called in MCA ISR in MCA context
*/
>> >>>>> int intel_mca_pre_handler(struct cpu_user_regs
*regs,
>> >>>>>                                 struct 
>mca_handle_result *result);
>> >>>>>
>> >>>>> int amd_mca_pre_handler(struct cpu_user_regs
*regs,
>> >>>>>                             struct
mca_handle_result *result);
>> >>>>>
>> >>>>> Frank.Vanderlinden@Sun.COM
>> >>
>> >> <mailto:Frank.Vanderlinden@Sun.COM> wrote:
>> >>>>>> Jiang, Yunhong wrote:
>> >>>>>>> Frank/Christopher, can you please give
more comments
>for it, or you
>> >>>>>>> are OK with this? For the action reporting
>mechanism, we will send
>> >>>>>>> out a proposal for review soon.
>> >>>>>>
>> >>>>>> I''m ok with this. We need a little
more information on the AMD
>> >>>>>> mechanism, but it seems to me that we can fit
this in.
>> >>>>>>
>> >>>>>> Sometime this week, I''ll also send
out the last of
>our changes that
>> >>>>>> haven''t been sent upstream to
xen-unstable yet. Maybe
>we can combine
>> >>>>>> some things in to one patch, like the
telemetry
>handling changes
>> >>>>>> that Gavin did. The other changes are error
injection (for
>> >>>>>> debugging) and panic crash dump support for
our FMA
>tools, but those
>> >>>>>> are probably only interesting to us.
>> >>>>>>
>> >>>>>> - Frank
>> >>>>
>> >>>> --
>
>
>-- 
>---to satisfy European Law for business letters:
>Advanced Micro Devices GmbH
>Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
>Geschaeftsfuehrer: Jochen Polster, Thomas M. McCoy, Giuliano Meroni
>Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
>Registergericht Muenchen, HRB Nr. 43632
>
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Frank van der Linden

2009-Mar-16 16:27 UTC

head link

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Jiang, Yunhong wrote:
 > Christoph Egger wrote:>> I think, we should mark the ''struct mcinfo_global'' as
a kind
>> of header for
>> each error. All following information describe the error 
>> (including the 
>> follow-up errors) and all recover actions. This gives us the 
>> flexibility
>> to get as many information as possible and allows to do
>> as many recover actions as necessary instead of just one.
> 
> I think your original proposal can also meet such purpose, i.e. include the
mc_recover_info and we still need pass all mc_bacnk infor to dom0 for telemetry.
If you prefer this one, can you please define the interface? Gavin/Frank, do you
have any idea for this changes?
Sorry about the slow reply.

Our changes to the MCE code (to combine the AMD and Intel code as much 
as possible, and use a transactional approach to the telemetry) already 
pretty much uses mc_global as a header. With our code, dom0 retrieves 
one mcinfo structure, with one global structure (which always comes 
first, but that''s not required).

In other words, using mc_global as kind of a header to the mcinfo data 
is fine, since we''re already doing that.

And, since we''re talking about transactions with one mcinfo structure
at
a time (with one mc_global structure), the recover_info structures can 
be separate from the bank structures.

- Frank

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Feb 2009 - [RFC] RAS(Part II)--MCA enalbing in XEN

[Xen-devel] [RFC] RAS(Part II)--MCA enalbing in XEN

[Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

RE: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN

Re: [Xen-devel] Re: [RFC] RAS(Part II)--MCA enalbing in XEN