thr3ads.net - Xen devel - [Xen-devel] RFC: MCA/MCE concept [May 2007]

If this information is useful, please help other people find it:
Share via:

Christoph Egger

2007-May-29 15:32 UTC

[Xen-devel] RFC: MCA/MCE concept

Hello!

The current MCA/MCE support in Xen is that it dumps the error and panics.

In the following concept I propose here, there are two places where Xen has to 
react on.
I) Xen receives a MCE from the CPU and
II) Xen receives Dom0 instructions via Hypercall

The term "self-healing" below is used in the sense of using the most
propriate
technique(s) to handle an error such as MPR 
(http://www.opensparc.net/pubs/papers/MPR_DSN06.pdf),
online-spare RAM or killing/restarting of impacted processes
to prevent crashes of whole guests or the whole machine.


case I) - Xen reveives a MCE from the CPU

1) Xen MCE handler figures out if error is an correctable error (CE)
    or uncorrectable error (UE)
2a) error == CE:
     Xen notifies Dom0 if Dom0 installed an MCA event handler
     for statistical purpose
2b) error == UE and UE impacts Xen or Dom0:
     Xen does some self-healing
         and notifies Dom0 on success if Dom0 installed MCA event handler
         or Xen panics on failure
2c)  error == UE and UE impacts DomU:
      In case of Dom0 installed MCA event handler:
          Xen notifies Dom0 and Dom0 tells Xen whether
              to also notify DomU and/or does some operations
              on the DomU (case II)
       In case Dom0 did not install MCA event handler,
           Xen notifies DomU
3a) DomU is a PV guest:
       if DomU installed MCA event handler, it gets notified to perform
          self-healing
       if DomU did not install MCA event handler, notify Dom0 to do
          some operations on DomU (case II)
       if neither DomU nor Dom0 did not install MCA event handlers,
          then Xen kills DomU
3b) DomU is a HVM guest:
       if DomU features a PV driver then behave as in 3a)
       if DomU enabled MCA/MCE via MSR, inject MCE into guest
       if DomU did not enable MCA/MCE via MSR, notify Dom0
            to do some operations on DomU (case II)
       if neither DomU enabled MCA/MCE nor Dom0 did not install
            MCA event handler, Xen kills DomU

case II) - Xen reveives Dom0 instructions via Hypercall

There are different reasons, why Xen should do something.

   - Dom0 got enough CEs so that UEs are very likely to happen in order
      to "circumvent" UEs.
   - Possible operations on a DomU
        - save/restore DomU
        - (live-)migrate DomU to a different physical machine
        - etc.



Some details

MCE

When an MCE occures, then all the stuff above should NOT happen within the
handler, because when an MCE happens within the MCE handler, then the CPU
enters shutdown state. So the mail topic "NMI deferal on i386" may be
related
here.

Notifying guests

Above I am talking about MCA event handler. What I actually mean is a way to
inform the guest something happened.
I choosed the term "MCA event handler", because I think, using the
event
mechanism fits best for this purpose.

Regarding HVM guests with no "MCA PV driver", can enable/disable
certain types
of errors. They can even control if tehy want to get an exception or do
polling.
I would prefer to always inject exceptions into the HVM guest. A HVM guest
can''t prevent when it always see''s exceptions, but I know if
they behave
correctly, when they assume to get all or certain errors via polling.

Guests which already feature fault management to a certain level when
running non-virtualized can easily re-use this capability to decode the
error telemetry and handle the error in the virtualized case.
Thus forwarding/injecting the error into a guest will only require the 
translation of the physical/virtual address reported by the HW into
guest physical/guest virtual addresses. The error code itself needs no
translation/abstraction.


self-healing

IMO, only Xen should use the HW features such as online-spare RAM, which
has been introduced in AMD K8 RevF. The HW features should never be visible
to any DomUs in order to reduce complexity in Xen. Software-only techniques
such as MPR are ok in all guests. Only the Dom0 can tell Xen to do something
using HW features.




Christoph


-- 
AMD Saxony, Dresden Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-May-30 07:19 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

>case I) - Xen reveives a MCE from the CPU
>
>1) Xen MCE handler figures out if error is an correctable error (CE)
>    or uncorrectable error (UE)
>2a) error == CE:
>     Xen notifies Dom0 if Dom0 installed an MCA event handler
>     for statistical purpose
>2b) error == UE and UE impacts Xen or Dom0:
A very important aspect here is how you want to classify what impact an
uncorrectable has - generally, I can see very few situations where you
could confine the impact to a sub-portion of the system (i.e. a single domU,
dom0, or Xen). The general rule in my opinion must be to halt the system,
the question just is how likely it is that you can get a meaningful message
out (to screen, serial, or logs) that can help analyze the problem afterwards.
If it is somewhat likely, then dom0 should be involved, otherwise Xen should
just shut down the system.
>     Xen does some self-healing
>         and notifies Dom0 on success if Dom0 installed MCA event handler
>         or Xen panics on failure
>2c)  error == UE and UE impacts DomU:
>      In case of Dom0 installed MCA event handler:
>          Xen notifies Dom0 and Dom0 tells Xen whether
>              to also notify DomU and/or does some operations
>              on the DomU (case II)
>       In case Dom0 did not install MCA event handler,
>           Xen notifies DomU
>3a) DomU is a PV guest:
>       if DomU installed MCA event handler, it gets notified to perform
>          self-healing
>       if DomU did not install MCA event handler, notify Dom0 to do
>          some operations on DomU (case II)
>       if neither DomU nor Dom0 did not install MCA event handlers,
>          then Xen kills DomU
>3b) DomU is a HVM guest:
>       if DomU features a PV driver then behave as in 3a)
What significance do pv drivers have here? Or do you mean a pv MCA
driver?
>       if DomU enabled MCA/MCE via MSR, inject MCE into guest
>       if DomU did not enable MCA/MCE via MSR, notify Dom0
>            to do some operations on DomU (case II)
>       if neither DomU enabled MCA/MCE nor Dom0 did not install
>            MCA event handler, Xen kills DomU
Injecting an MCE to a hvm guest seems at least questionable. It can''t
really
do anything about it (it doesn''t even know the real topology of the
system
it''s running on, so addresses stored in MSRs are meaningless - either
you
allow the to be read untranslated [in which case the guest cannot make
sense of them] or you do translation for the guest [in which case it might
make assumptions about co-locality of other nearby pages which will be
wrong]).
Doing this to a pv domU for purely notification purposes (where the guest
knows it''s running virtualized) is clearly a different matter.
>case II) - Xen reveives Dom0 instructions via Hypercall
>
>There are different reasons, why Xen should do something.
>
>   - Dom0 got enough CEs so that UEs are very likely to happen in order
>      to "circumvent" UEs.
>   - Possible operations on a DomU
>        - save/restore DomU
>        - (live-)migrate DomU to a different physical machine
>        - etc.
Very heavy-weight operations, which I think are unlikely to succeed if
you already suspect the system''s going to suffer a UE soon.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-May-30 07:45 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:> >case I) - Xen reveives a MCE from the CPU
> >
> >1) Xen MCE handler figures out if error is an correctable error (CE)
> >    or uncorrectable error (UE)
> >2a) error == CE:
> >     Xen notifies Dom0 if Dom0 installed an MCA event handler
> >     for statistical purpose
> >2b) error == UE and UE impacts Xen or Dom0:
>
> A very important aspect here is how you want to classify what impact an
> uncorrectable has - generally, I can see very few situations where you
> could confine the impact to a sub-portion of the system (i.e. a single
> domU, dom0, or Xen). The general rule in my opinion must be to halt the
> system, the question just is how likely it is that you can get a meaningful
> message out (to screen, serial, or logs) that can help analyze the problem
> afterwards. If it is somewhat likely, then dom0 should be involved,
> otherwise Xen should just shut down the system.
Here you can best help out using HW features to handle errors.
AMD CPUs features online-spare RAM and Chipkill since K8 RevF.

CPUs such as the Sparc features Data Poisoning. That would be the
most handy technique that can be used here.

Maybe this line:
> >     Xen does some self-healing
should be this:

            Xen *tries* to do some self-healing> >         and notifies Dom0 on success if Dom0 installed MCA event
handler
> >         or Xen panics on failure
The first implemenation can just panic here. The self-healing will be
implemented and improved over time.
> >2c)  error == UE and UE impacts DomU:
> >      In case of Dom0 installed MCA event handler:
> >          Xen notifies Dom0 and Dom0 tells Xen whether
> >              to also notify DomU and/or does some operations
> >              on the DomU (case II)
> >       In case Dom0 did not install MCA event handler,
> >           Xen notifies DomU
> >3a) DomU is a PV guest:
> >       if DomU installed MCA event handler, it gets notified to perform
> >          self-healing
> >       if DomU did not install MCA event handler, notify Dom0 to do
> >          some operations on DomU (case II)
> >       if neither DomU nor Dom0 did not install MCA event handlers,
> >          then Xen kills DomU
> >3b) DomU is a HVM guest:
> >       if DomU features a PV driver then behave as in 3a)
>
> What significance do pv drivers have here? Or do you mean a pv MCA
> driver?
Yes, I mean the pv MCA driver.
>
> >       if DomU enabled MCA/MCE via MSR, inject MCE into guest
> >       if DomU did not enable MCA/MCE via MSR, notify Dom0
> >            to do some operations on DomU (case II)
> >       if neither DomU enabled MCA/MCE nor Dom0 did not install
> >            MCA event handler, Xen kills DomU
>
> Injecting an MCE to a hvm guest seems at least questionable. It
can''t
> really do anything about it (it doesn''t even know the real
topology of the
> system it''s running on, so addresses stored in MSRs are
meaningless -
> either you allow them to be read untranslated [in which case the guest
> cannot make sense of them] or you do translation for the guest [in which
> case it might make assumptions about co-locality of other nearby pages
> which will be wrong]).
Yes, Xen should do the translation for the guest. The assumptions must
be fixed then. I know that''s easier said than done.
> Doing this to a pv domU for purely notification purposes (where the guest
> knows it''s running virtualized) is clearly a different matter.
Yes, I agree with you here. The general idea behind informing a DomU
is to let its own fault management handle the error. It is always better to 
let it kill a screen saver process and keep the word processor running than
killing the whole guest. The DomU should crash itself if it thinks
that''s the
best.

> >case II) - Xen reveives Dom0 instructions via Hypercall
> >
> >There are different reasons, why Xen should do something.
> >
> >   - Dom0 got enough CEs so that UEs are very likely to happen in order
> >      to "circumvent" UEs.
> >   - Possible operations on a DomU
> >        - save/restore DomU
> >        - (live-)migrate DomU to a different physical machine
> >        - etc.
>
> Very heavy-weight operations, which I think are unlikely to succeed if
> you already suspect the system''s going to suffer a UE soon.
Yes, they are heavy-weight operations. Do you have some ideas, what
a Dom0 can do?

The idea here is that the Dom0''s fault management helps guests to
survive as best as possible.

Christoph

-- 
AMD Saxony, Dresden Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-May-30 08:49 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

>>> "Christoph Egger" <Christoph.Egger@amd.com>
30.05.07 09:45 >>>
>On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:
>> >case I) - Xen reveives a MCE from the CPU
>> >
>> >1) Xen MCE handler figures out if error is an correctable error
(CE)
>> >    or uncorrectable error (UE)
>> >2a) error == CE:
>> >     Xen notifies Dom0 if Dom0 installed an MCA event handler
>> >     for statistical purpose
>> >2b) error == UE and UE impacts Xen or Dom0:
>>
>> A very important aspect here is how you want to classify what impact an
>> uncorrectable has - generally, I can see very few situations where you
>> could confine the impact to a sub-portion of the system (i.e. a single
>> domU, dom0, or Xen). The general rule in my opinion must be to halt the
>> system, the question just is how likely it is that you can get a
meaningful
>> message out (to screen, serial, or logs) that can help analyze the
problem
>> afterwards. If it is somewhat likely, then dom0 should be involved,
>> otherwise Xen should just shut down the system.
>
>Here you can best help out using HW features to handle errors.
>AMD CPUs features online-spare RAM and Chipkill since K8 RevF.
>
>CPUs such as the Sparc features Data Poisoning. That would be the
>most handy technique that can be used here.
But that assumes the error is recoverable (i.e. no other data got
corrupted). You still didn''t clarify how you intend to determine the
impact an uncorrectable error had.
>> >3a) DomU is a PV guest:
>> >       if DomU installed MCA event handler, it gets notified to
perform
>> >          self-healing
>> >       if DomU did not install MCA event handler, notify Dom0 to do
>> >          some operations on DomU (case II)
>> >       if neither DomU nor Dom0 did not install MCA event handlers,
>> >          then Xen kills DomU
>> >3b) DomU is a HVM guest:
>> >       if DomU features a PV driver then behave as in 3a)
>>
>> What significance do pv drivers have here? Or do you mean a pv MCA
>> driver?
>
>Yes, I mean the pv MCA driver.
>
>>
>> >       if DomU enabled MCA/MCE via MSR, inject MCE into guest
>> >       if DomU did not enable MCA/MCE via MSR, notify Dom0
>> >            to do some operations on DomU (case II)
>> >       if neither DomU enabled MCA/MCE nor Dom0 did not install
>> >            MCA event handler, Xen kills DomU
>>
>> Injecting an MCE to a hvm guest seems at least questionable. It
can''t
>> really do anything about it (it doesn''t even know the real
topology of the
>> system it''s running on, so addresses stored in MSRs are
meaningless -
>> either you allow them to be read untranslated [in which case the guest
>> cannot make sense of them] or you do translation for the guest [in
which
>> case it might make assumptions about co-locality of other nearby pages
>> which will be wrong]).
>
>Yes, Xen should do the translation for the guest. The assumptions must
>be fixed then. I know that''s easier said than done.
Exactly - you are proposing to fix all possible OSes, including sufficiently old
ones. That''s impossible. And I can''t even see why an OS
intended to run on
native hardware would care to try to deal with virtualization aspects like this.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-May-30 09:10 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On Wednesday 30 May 2007 10:49:40 Jan Beulich wrote:> >>> "Christoph Egger" <Christoph.Egger@amd.com>
30.05.07 09:45 >>>
> >
> >On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:
> >> >case I) - Xen reveives a MCE from the CPU
> >> >
> >> >1) Xen MCE handler figures out if error is an correctable
error (CE)
> >> >    or uncorrectable error (UE)
> >> >2a) error == CE:
> >> >     Xen notifies Dom0 if Dom0 installed an MCA event handler
> >> >     for statistical purpose
> >> >2b) error == UE and UE impacts Xen or Dom0:
> >>
> >> A very important aspect here is how you want to classify what
impact an
> >> uncorrectable has - generally, I can see very few situations where
you
> >> could confine the impact to a sub-portion of the system (i.e. a
single
> >> domU, dom0, or Xen). The general rule in my opinion must be to
halt the
> >> system, the question just is how likely it is that you can get a
> >> meaningful message out (to screen, serial, or logs) that can help
> >> analyze the problem afterwards. If it is somewhat likely, then
dom0
> >> should be involved, otherwise Xen should just shut down the
system.
> >
> >Here you can best help out using HW features to handle errors.
> >AMD CPUs features online-spare RAM and Chipkill since K8 RevF.
> >
> >CPUs such as the Sparc features Data Poisoning. That would be the
> >most handy technique that can be used here.
>
> But that assumes the error is recoverable (i.e. no other data got
> corrupted). You still didn''t clarify how you intend to determine
the
> impact an uncorrectable error had.
I know. I am lacking a sudden inspiration here.
That''s why I discuss this here before writing code that goes to
nowhere.
Anyone here with a flash of genius? :-)

> >> >3a) DomU is a PV guest:
> >> >       if DomU installed MCA event handler, it gets notified
to perform
> >> >          self-healing
> >> >       if DomU did not install MCA event handler, notify Dom0
to do
> >> >          some operations on DomU (case II)
> >> >       if neither DomU nor Dom0 did not install MCA event
handlers,
> >> >          then Xen kills DomU
> >> >3b) DomU is a HVM guest:
> >> >       if DomU features a PV driver then behave as in 3a)
> >>
> >> What significance do pv drivers have here? Or do you mean a pv MCA
> >> driver?
> >
> >Yes, I mean the pv MCA driver.
> >
> >> >       if DomU enabled MCA/MCE via MSR, inject MCE into guest
> >> >       if DomU did not enable MCA/MCE via MSR, notify Dom0
> >> >            to do some operations on DomU (case II)
> >> >       if neither DomU enabled MCA/MCE nor Dom0 did not
install
> >> >            MCA event handler, Xen kills DomU
> >>
> >> Injecting an MCE to a hvm guest seems at least questionable. It
can''t
> >> really do anything about it (it doesn''t even know the
real topology of
> >> the system it''s running on, so addresses stored in MSRs
are meaningless
> >> - either you allow them to be read untranslated [in which case the
guest
> >> cannot make sense of them] or you do translation for the guest [in
which
> >> case it might make assumptions about co-locality of other nearby
pages
> >> which will be wrong]).
> >
> >Yes, Xen should do the translation for the guest. The assumptions must
> >be fixed then. I know that''s easier said than done.
>
> Exactly - you are proposing to fix all possible OSes, including
> sufficiently old ones. That''s impossible. And I can''t
even see why an OS
> intended to run on native hardware would care to try to deal with
> virtualization aspects like this.
I think, it was not obvious that
Xen should not inject failures into DomU that don''t feature
a fault management. In this case, either Dom0 tells Xen what
to do with the DomU or Xen just kills the DomU.

<snippet from above>> >> >3a) DomU is a PV guest:
                    ....> >> >       if DomU did not install MCA event handler, notify Dom0
to do
> >> >          some operations on DomU (case II)
> >> >       if neither DomU nor Dom0 did not install MCA event
handlers,
> >> >          then Xen kills DomU
> >> >3b) DomU is a HVM guest:
                    ....> >> >       if DomU did not enable MCA/MCE via MSR, notify Dom0
> >> >            to do some operations on DomU (case II)
> >> >       if neither DomU enabled MCA/MCE nor Dom0 did not
install
> >> >            MCA event handler, Xen kills DomU</snippet>


Christoph

-- 
AMD Saxony, Dresden Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-May-30 09:59 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

>> >> Injecting an MCE to a hvm guest seems at least questionable.
It can''t
>> >> really do anything about it (it doesn''t even know the
real topology of
>> >> the system it''s running on, so addresses stored in
MSRs are meaningless
>> >> - either you allow them to be read untranslated [in which case
the guest
>> >> cannot make sense of them] or you do translation for the guest
[in which
>> >> case it might make assumptions about co-locality of other
nearby pages
>> >> which will be wrong]).
>> >
>> >Yes, Xen should do the translation for the guest. The assumptions
must
>> >be fixed then. I know that''s easier said than done.
>>
>> Exactly - you are proposing to fix all possible OSes, including
>> sufficiently old ones. That''s impossible. And I can''t
even see why an OS
>> intended to run on native hardware would care to try to deal with
>> virtualization aspects like this.
>
>I think, it was not obvious that
>Xen should not inject failures into DomU that don''t feature
>a fault management. In this case, either Dom0 tells Xen what
>to do with the DomU or Xen just kills the DomU.
You apparently didn''t get my point - even if the guest set up MCE
properly
(by setting CR4.MCE and possibly writing some MSRs) you cannot conclude
that it is aware of the fact that it is running in a virtualized environment and
that guest physical address relations do not map to machine physical
address relations (i.e. a set of pages contiguous in gpa space is almost
guaranteed to be discontiguous in mpa space). Hence if it is more than a
single byte/cache line/page that is affected, any such assumptions made
in the guest will be wrong.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-May-30 10:12 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On Wednesday 30 May 2007 11:59:31 Jan Beulich wrote:> >> >> Injecting an MCE to a hvm guest seems at least
questionable. It can''t
> >> >> really do anything about it (it doesn''t even
know the real topology
> >> >> of the system it''s running on, so addresses
stored in MSRs are
> >> >> meaningless - either you allow them to be read
untranslated [in which
> >> >> case the guest cannot make sense of them] or you do
translation for
> >> >> the guest [in which case it might make assumptions about
co-locality
> >> >> of other nearby pages which will be wrong]).
> >> >
> >> >Yes, Xen should do the translation for the guest. The
assumptions must
> >> >be fixed then. I know that''s easier said than done.
> >>
> >> Exactly - you are proposing to fix all possible OSes, including
> >> sufficiently old ones. That''s impossible. And I
can''t even see why an OS
> >> intended to run on native hardware would care to try to deal with
> >> virtualization aspects like this.
> >
> >I think, it was not obvious that
> >Xen should not inject failures into DomU that don''t feature
> >a fault management. In this case, either Dom0 tells Xen what
> >to do with the DomU or Xen just kills the DomU.
>
> You apparently didn''t get my point - even if the guest set up MCE
properly
> (by setting CR4.MCE and possibly writing some MSRs) you cannot conclude
> that it is aware of the fact that it is running in a virtualized
> environment and that guest physical address relations do not map to machine
> physical address relations (i.e. a set of pages contiguous in gpa space is
> almost guaranteed to be discontiguous in mpa space). Hence if it is more
> than a single byte/cache line/page that is affected, any such assumptions
> made in the guest will be wrong.
Ah, I see. So HVM guests can only handle those errors where this assumption
is guaranteed to be correct. Xen needs to check the error type, the
address and the size (= address range) for this.
If Xen can''t determine for sure, the guest can handle this right, then
Xen has
to handle the DomU as a guest which does not feature fault management.


Christoph


-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gavin Maltby

2007-May-30 13:50 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

Hi,

On 05/30/07 10:10, Christoph Egger wrote:

[cut]
>>>>> 2b) error == UE and UE impacts Xen or Dom0:
>>>> A very important aspect here is how you want to classify what
impact an
>>>> uncorrectable has - generally, I can see very few situations
where you
>>>> could confine the impact to a sub-portion of the system (i.e. a
single
>>>> domU, dom0, or Xen). The general rule in my opinion must be to
halt the
>>>> system, the question just is how likely it is that you can get
a
>>>> meaningful message out (to screen, serial, or logs) that can
help
>>>> analyze the problem afterwards. If it is somewhat likely, then
dom0
>>>> should be involved, otherwise Xen should just shut down the
system.
>>> Here you can best help out using HW features to handle errors.
>>> AMD CPUs features online-spare RAM and Chipkill since K8 RevF.
>>>
>>> CPUs such as the Sparc features Data Poisoning. That would be the
>>> most handy technique that can be used here.
>> But that assumes the error is recoverable (i.e. no other data got
>> corrupted). You still didn''t clarify how you intend to
determine the
>> impact an uncorrectable error had.
> 
> I know. I am lacking a sudden inspiration here.
> That''s why I discuss this here before writing code that goes to
nowhere.
> Anyone here with a flash of genius? :-)
For a first phase I''d suggest that treating an uncorrectable error as
terminal to the entire system (e.g., panic hypervisor or setup a hardware
reset mechanism such as Sync Flood) is practical and safe, and allows
us to concentrate on getting some more basic elements in place.
As Christoph says we really need some form of data poisoning supported
on the platform to really be able to isolate the impact of an uncorrectable
error.  In the absence of such support I think some fancy heuristics could
work in some limited cases (e.g., a memory uncorrectable on a page that
only a domU has a mapping to and which is not shared with any other domain
not even via a front/backend driver) but the penalty for bugs in those
heuristics is silent data corruption which is the ultimate crime.
> 
>>>>> 3a) DomU is a PV guest:
>>>>>       if DomU installed MCA event handler, it gets notified
to perform
>>>>>          self-healing
>>>>>       if DomU did not install MCA event handler, notify
Dom0 to do
>>>>>          some operations on DomU (case II)
>>>>>       if neither DomU nor Dom0 did not install MCA event
handlers,
>>>>>          then Xen kills DomU
>>>>> 3b) DomU is a HVM guest:
>>>>>       if DomU features a PV driver then behave as in 3a)
>>>> What significance do pv drivers have here? Or do you mean a pv
MCA
>>>> driver?[cut]

My feeling is that the hypervisor and dom0 own the hardware and as such
all hardware fault management should reside there.  So we should never
deliver any form of #MC to a domU, nor should a poll of MCA state from
a domU ever observe valid state (e.g, make the RDMSR return 0).
So all handling, logging and diagnosis as well as hardware response actions
(such as to deploy an online spare chip-select) are controlled
in the hypervisor/dom0 combination.  That seems a consistent model - e.g.,
if a domU is migrated to another system it should not carry the
diagnosis state of the original system across etc, since that belongs with
the one domain that cannot migrate.

But that is not to say that (I think at a future phase) domU should not
participate in a higher-level fault management function, at the direction
of the hypervisor/dom0 combo.  For example if/when we can isolate an
uncorrectable error to a single domU we could forward such an event to
the affected domU if it has registered its ability/interest in such
events.  These won''t be in the form of a faked #MC or anything,
instead they''d be some form of synchronous trap experienced when next
the affected domU context resumes on CPU.  The intelligent domU handler
can then decide whether the domU must panic, whether it could simply
kill the affected process etc.  Those details are clearly sketchy, but the
idea is to up-level the communication to a domU to be more like
"you''re broken" rather than "here''s a
machine-level hardware error for
you to interpret and decide what to do with".

Gavin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gavin Maltby

2007-May-30 14:00 UTC

head link

Re: Re: [Xen-devel] RFC: MCA/MCE concept

Hi,

Apologies for the screwy quoting below - I did not receive the first half of
this
thread so it''s been forwarded to me.
>>>
>>>   - Dom0 got enough CEs so that UEs are very likely to happen in
order
>>>      to "circumvent" UEs.
The greatest rewards here are in syndrome/row/column/bank analysis of the
error stream.  Where something like a bad pin produces tonnes of CEs
they are always on the same bit and your chance of a UE is that of a random
radiation type CE colliding within the set of ECC checkwords being undermined
by that pin - not very high.  On the other hand if we''re seeing
repeated
distinct syndromes from the same chip-select (or chip-select in a pair)
then there is a good chance they could collide "soon" - our data is
that
this combination predicts a UE within hours to a few days.  If you have
row/column/bank decoding you can also perform further analysis of the
error source and assess the chances of a collision that would produce a UE.

That example has DIMM memory in mind, but similar approaches apply to
cache memory where it is ECC protected and so on.
>>>   - Possible operations on a DomU
>>>        - save/restore DomU
>>>        - (live-)migrate DomU to a different physical machine
>>>        - etc.
>> Very heavy-weight operations, which I think are unlikely to succeed if
>> you already suspect the system''s going to suffer a UE soon.
As above, some predictors can give you hours to a few days warning of a UE.

Gavin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Petersson, Mats

2007-May-30 15:03 UTC

head link

RE: [Xen-devel] RFC: MCA/MCE concept

[snip]> My feeling is that the hypervisor and dom0 own the hardware 
> and as such
> all hardware fault management should reside there.  So we should never
> deliver any form of #MC to a domU, nor should a poll of MCA state from
> a domU ever observe valid state (e.g, make the RDMSR return 0).
> So all handling, logging and diagnosis as well as hardware 
> response actions
> (such as to deploy an online spare chip-select) are controlled
> in the hypervisor/dom0 combination.  That seems a consistent 
> model - e.g.,
> if a domU is migrated to another system it should not carry the
> diagnosis state of the original system across etc, since that 
> belongs with
> the one domain that cannot migrate.
I agree entirely with this. 
> 
> But that is not to say that (I think at a future phase) domU 
> should not
> participate in a higher-level fault management function, at 
> the direction
> of the hypervisor/dom0 combo.  For example if/when we can isolate an
> uncorrectable error to a single domU we could forward such an event to
> the affected domU if it has registered its ability/interest in such
> events.  These won''t be in the form of a faked #MC or anything,
> instead they''d be some form of synchronous trap experienced when
next
> the affected domU context resumes on CPU.  The intelligent 
> domU handler
> can then decide whether the domU must panic, whether it could simply
> kill the affected process etc.  Those details are clearly 
> sketchy, but the
> idea is to up-level the communication to a domU to be more like
> "you''re broken" rather than "here''s a
machine-level hardware error for
> you to interpret and decide what to do with".
Yes, this makes much more sense than forwarding #MC, as the guest would
have a hard time to actually do anything really useful with this. As far
as I know, most uncorrectable errors are near enough entirely fatal in
most commercial non-Enterprise OS''s anyways - e.g. in Windows XP or
Server 2K3, it always ends in a blue-screen - which is hardly any better
than the guest being "humanely euthenazed" by Dom0. 

I take it this would be some sort of hypercall (available through the
regular PV-driver interface for HVM guests) to say "Let me know if
I''m
broken - trap on vector X". 

--
Mats> 
> Gavin
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-Jun-01 08:11 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On Wednesday 30 May 2007 17:03:55 Petersson, Mats wrote:> [snip]
>
> > My feeling is that the hypervisor and dom0 own the hardware
> > and as such
> > all hardware fault management should reside there.  So we should never
> > deliver any form of #MC to a domU, nor should a poll of MCA state from
> > a domU ever observe valid state (e.g, make the RDMSR return 0).
> > So all handling, logging and diagnosis as well as hardware
> > response actions
> > (such as to deploy an online spare chip-select) are controlled
> > in the hypervisor/dom0 combination.  That seems a consistent
> > model - e.g.,
> > if a domU is migrated to another system it should not carry the
> > diagnosis state of the original system across etc, since that
> > belongs with
> > the one domain that cannot migrate.
>
> I agree entirely with this.
>
> > But that is not to say that (I think at a future phase) domU
> > should not
> > participate in a higher-level fault management function, at
> > the direction
> > of the hypervisor/dom0 combo.  For example if/when we can isolate an
> > uncorrectable error to a single domU we could forward such an event to
> > the affected domU if it has registered its ability/interest in such
> > events.  These won''t be in the form of a faked #MC or
anything,
> > instead they''d be some form of synchronous trap experienced
when next
> > the affected domU context resumes on CPU.  The intelligent
> > domU handler
> > can then decide whether the domU must panic, whether it could simply
> > kill the affected process etc.  Those details are clearly
> > sketchy, but the
> > idea is to up-level the communication to a domU to be more like
> > "you''re broken" rather than "here''s a
machine-level hardware error for
> > you to interpret and decide what to do with".
>
> Yes, this makes much more sense than forwarding #MC, as the guest would
> have a hard time to actually do anything really useful with this. As far
> as I know, most uncorrectable errors are near enough entirely fatal in
> most commercial non-Enterprise OS''s anyways - e.g. in Windows XP
or
> Server 2K3, it always ends in a blue-screen - which is hardly any better
> than the guest being "humanely euthenazed" by Dom0.
>
> I take it this would be some sort of hypercall (available through the
> regular PV-driver interface for HVM guests) to say "Let me know if
I''m
> broken - trap on vector X".
For short, guests with a PV MCA driver will see a certain event
(assuming the event mechanism will be used for the notification)
and guests w/o a PV MCA driver will see a "General Protection Fault".
Is that right?
> --
> Mats
>
> > Gavin
> >
-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Petersson, Mats

2007-Jun-01 08:55 UTC

head link

RE: [Xen-devel] RFC: MCA/MCE concept

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> Christoph Egger
> Sent: 01 June 2007 09:12
> To: xen-devel@lists.xensource.com
> Cc: Gavin Maltby
> Subject: Re: [Xen-devel] RFC: MCA/MCE concept
> 
> On Wednesday 30 May 2007 17:03:55 Petersson, Mats wrote:
> > [snip]
> >
> > > My feeling is that the hypervisor and dom0 own the hardware
> > > and as such
> > > all hardware fault management should reside there.  So we 
> should never
> > > deliver any form of #MC to a domU, nor should a poll of 
> MCA state from
> > > a domU ever observe valid state (e.g, make the RDMSR return 0).
> > > So all handling, logging and diagnosis as well as hardware
> > > response actions
> > > (such as to deploy an online spare chip-select) are controlled
> > > in the hypervisor/dom0 combination.  That seems a consistent
> > > model - e.g.,
> > > if a domU is migrated to another system it should not carry the
> > > diagnosis state of the original system across etc, since that
> > > belongs with
> > > the one domain that cannot migrate.
> >
> > I agree entirely with this.
> >
> > > But that is not to say that (I think at a future phase) domU
> > > should not
> > > participate in a higher-level fault management function, at
> > > the direction
> > > of the hypervisor/dom0 combo.  For example if/when we can 
> isolate an
> > > uncorrectable error to a single domU we could forward 
> such an event to
> > > the affected domU if it has registered its 
> ability/interest in such
> > > events.  These won''t be in the form of a faked #MC or
anything,
> > > instead they''d be some form of synchronous trap 
> experienced when next
> > > the affected domU context resumes on CPU.  The intelligent
> > > domU handler
> > > can then decide whether the domU must panic, whether it 
> could simply
> > > kill the affected process etc.  Those details are clearly
> > > sketchy, but the
> > > idea is to up-level the communication to a domU to be more like
> > > "you''re broken" rather than
"here''s a machine-level
> hardware error for
> > > you to interpret and decide what to do with".
> >
> > Yes, this makes much more sense than forwarding #MC, as the 
> guest would
> > have a hard time to actually do anything really useful with 
> this. As far
> > as I know, most uncorrectable errors are near enough 
> entirely fatal in
> > most commercial non-Enterprise OS''s anyways - e.g. in Windows
XP or
> > Server 2K3, it always ends in a blue-screen - which is 
> hardly any better
> > than the guest being "humanely euthenazed" by Dom0.
> >
> > I take it this would be some sort of hypercall (available 
> through the
> > regular PV-driver interface for HVM guests) to say "Let me 
> know if I''m
> > broken - trap on vector X".
> 
> For short, guests with a PV MCA driver will see a certain event
> (assuming the event mechanism will be used for the notification)
> and guests w/o a PV MCA driver will see a "General Protection
Fault".
> Is that right?
Not sure if GP fault is the right thing for non-"MCA PV driver"
domains. I think "just killing" the domain is the right thing to do.

We can''t gurantee that a GP fault is actually going to "kill"
the guest. Let''s assume the code that ran on the guest was something
along the lines of:


int some_function(...)
{
   ... 

   try {
      ...
      /* Some code that does quite a lot of "random" processing that
may cause, for example, GP fault */
	...
   } catch(Exception e) 
   {
	... 
	/* handles GP fault within the kernel code */
	...
   }
}


Note that Windows kernel drivers are allowed to use the kernel exception
handling, and ARE allowed to "allow" GP faults if they wish to do so.
[Don''t ask me why MS allows this, but that''s the case, so we
have to live with it].

I''m not sure if Linux, Solaris, *BSD, OS/2 or other OS''s will
allow "catching" a Kernel GP fault in a non-precise fashion (I know
Linux has exception handling for EXACT positions in the code). But since at
least one kernel DOES allow this, we can''t be sure that a GPF will
destroy the guest.

Second point to note is of course that if the guest is in user-mode when the GPF
happens, then almost all OS''s will just kill the application - and
there''s absolutely no reason to believe that the application running is
necessarily where the actual memory problem is - it may be caused by memory
scrubbing for example.

Whatever we do to the guest, it should be a "certain death", unless
the kernel has told us "I can handle MCE''s".

--
Mats
> 
> > --
> > Mats
> >
> > > Gavin
> > >
> 
> -- 
> AMD Saxony, Dresden, Germany
> Operating System Research Center
> 
> Legal Information:
> AMD Saxony Limited Liability Company & Co. KG
> Sitz (Geschäftsanschrift):
>    Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
> Registergericht Dresden: HRA 4896
> vertretungsberechtigter Komplementär:
>    AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
> Geschäftsführer der AMD Saxony LLC:
>    Dr. Hans-R. Deppe, Thomas McCoy
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-Jun-01 09:28 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On Friday 01 June 2007 10:55:28 Petersson, Mats wrote:

[snip]
> > For short, guests with a PV MCA driver will see a certain event
> > (assuming the event mechanism will be used for the notification)
> > and guests w/o a PV MCA driver will see a "General Protection
Fault".
> > Is that right?
>
> Not sure if GP fault is the right thing for non-"MCA PV driver"
domains. I
> think "just killing" the domain is the right thing to do.
>
> We can''t gurantee that a GP fault is actually going to
"kill" the guest.
> Let''s assume the code that ran on the guest was something along
the lines
> of:
>
>
> int some_function(...)
> {
>    ...
>
>    try {
>       ...
>       /* Some code that does quite a lot of "random" processing
that may
> cause, for example, GP fault */ ...
>    } catch(Exception e)
>    {
> 	...
> 	/* handles GP fault within the kernel code */
> 	...
>    }
> }
>
>
> Note that Windows kernel drivers are allowed to use the kernel exception
> handling, and ARE allowed to "allow" GP faults if they wish to do
so.
> [Don''t ask me why MS allows this, but that''s the case, so
we have to live
> with it].
In that case, it will die sooner or later *after* consuming the data in error.
That means, the guest continues to live for an unknown time...
> I''m not sure if Linux, Solaris, *BSD, OS/2 or other OS''s
will allow
> "catching" a Kernel GP fault in a non-precise fashion (I know
Linux has
> exception handling for EXACT positions in the code). But since at least one
> kernel DOES allow this, we can''t be sure that a GPF will destroy
the guest.
When Linux and *BSD see a GPF while they are in userspace, then they kill
the process with a SIGSEGV. If they are in kernelspace, then they panic.
> Second point to note is of course that if the guest is in user-mode when
> the GPF happens, then almost all OS''s will just kill the
application - and
> there''s absolutely no reason to believe that the application
running is
> necessarily where the actual memory problem is - it may be caused by memory
> scrubbing for example.
>
> Whatever we do to the guest, it should be a "certain death",
unless the
> kernel has told us "I can handle MCE''s".
It is obvious that there is no absolute generic way to handle all sort of 
buggy guests. I vote for:

If DomU has a PV MCA driver use this or inject a GPF.
Multiplexing all the MSR''s related to emulate MCA/MCE for the guests is
much
more complex than just injecting a GPF - and slower.

Keir, what are your opinions on this thread?


Christoph

-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Petersson, Mats

2007-Jun-01 09:48 UTC

head link

RE: [Xen-devel] RFC: MCA/MCE concept

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> Christoph Egger
> Sent: 01 June 2007 10:28
> To: xen-devel@lists.xensource.com
> Cc: Gavin Maltby; Keir Fraser
> Subject: Re: [Xen-devel] RFC: MCA/MCE concept
> 
> On Friday 01 June 2007 10:55:28 Petersson, Mats wrote:
> 
> [snip]
> 
> > > For short, guests with a PV MCA driver will see a certain event
> > > (assuming the event mechanism will be used for the notification)
> > > and guests w/o a PV MCA driver will see a "General 
> Protection Fault".
> > > Is that right?
> >
> > Not sure if GP fault is the right thing for non-"MCA PV 
> driver" domains. I
> > think "just killing" the domain is the right thing to do.
> >
> > We can''t gurantee that a GP fault is actually going to 
> "kill" the guest.
> > Let''s assume the code that ran on the guest was something 
> along the lines
> > of:
> >
> >
> > int some_function(...)
> > {
> >    ...
> >
> >    try {
> >       ...
> >       /* Some code that does quite a lot of "random" 
> processing that may
> > cause, for example, GP fault */ ...
> >    } catch(Exception e)
> >    {
> > 	...
> > 	/* handles GP fault within the kernel code */
> > 	...
> >    }
> > }
> >
> >
> > Note that Windows kernel drivers are allowed to use the 
> kernel exception
> > handling, and ARE allowed to "allow" GP faults if they wish 
> to do so.
> > [Don''t ask me why MS allows this, but that''s the
case, so
> we have to live
> > with it].
> 
> In that case, it will die sooner or later *after* consuming 
> the data in error.
> That means, the guest continues to live for an unknown time...
Yes. What I''m worried about is that if you have a "transient"
or "few-bit" error in a rarely used, the guest may well live a LONG
time with incorrect data and potentially not get it detected for quite some time
again (say it''s two bits have stuck to 0, and the data is then written
back with the zero''s there - next time we read it, no error, since the
data has zero''s in that location.

Also consider the case where one cell (or small block of cells) has gone bad,
but it''s only used by one single piece of code that is using this
try/catch code? I know, this is probably relatively rare, but I''m still
worried that it will "break" things...
> 
> > I''m not sure if Linux, Solaris, *BSD, OS/2 or other
OS''s will allow
> > "catching" a Kernel GP fault in a non-precise fashion (I 
> know Linux has
> > exception handling for EXACT positions in the code). But 
> since at least one
> > kernel DOES allow this, we can''t be sure that a GPF will 
> destroy the guest.
> 
> When Linux and *BSD see a GPF while they are in userspace, 
> then they kill
> the process with a SIGSEGV. If they are in kernelspace, then 
> they panic.
> 
> > Second point to note is of course that if the guest is in 
> user-mode when
> > the GPF happens, then almost all OS''s will just kill the 
> application - and
> > there''s absolutely no reason to believe that the 
> application running is
> > necessarily where the actual memory problem is - it may be 
> caused by memory
> > scrubbing for example.
> >
> > Whatever we do to the guest, it should be a "certain 
> death", unless the
> > kernel has told us "I can handle MCE''s".
> 
> It is obvious that there is no absolute generic way to handle 
> all sort of 
> buggy guests. I vote for:
> 
> If DomU has a PV MCA driver use this or inject a GPF.
> Multiplexing all the MSR''s related to emulate MCA/MCE for the 
> guests is much
> more complex than just injecting a GPF - and slower.
Emulating MCE to the guest wasn''t my intended alternative suggestion.
Instead, my idea was that if the guest hasn''t registered a "PV MCE
handler", we just immediately kill the domain as such - e.g similar to
"domain_crash_synchronous()". Don''t let the guest have any
chance to "do something wrong" in the process - it''s already
broken, and letting it run any further will almost certainly not help matters.
This may not be the prettiest solution, but then on the other hand, a
"Windows blue-screen" or Linux "oops" saying GP fault
happened at some random place in the guest isn''t exactly helping the
SysAdmin understand the problem either.

--
Mats> 
> Keir, what are your opinions on this thread?
> 
> 
> Christoph
> 
> -- 
> AMD Saxony, Dresden, Germany
> Operating System Research Center
> 
> Legal Information:
> AMD Saxony Limited Liability Company & Co. KG
> Sitz (Geschäftsanschrift):
>    Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
> Registergericht Dresden: HRA 4896
> vertretungsberechtigter Komplementär:
>    AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
> Geschäftsführer der AMD Saxony LLC:
>    Dr. Hans-R. Deppe, Thomas McCoy
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gavin Maltby

2007-Jun-01 10:57 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

Hi

On 06/01/07 10:48, Petersson, Mats wrote:
[cut]
>>> Note that Windows kernel drivers are allowed to use the 
>> kernel exception
>>> handling, and ARE allowed to "allow" GP faults if they
wish
>> to do so.
>>> [Don''t ask me why MS allows this, but that''s the
case, so
>> we have to live
>>> with it].
>> In that case, it will die sooner or later *after* consuming 
>> the data in error.
>> That means, the guest continues to live for an unknown time...
> 
> Yes. What I''m worried about is that if you have a
"transient" or "few-bit" > error in a rarely used, the guest may well live a LONG time with incorrect
 > data and potentially not get it detected for quite some time again (say
it''s> two bits have stuck to 0, and the data is then written back with the
zero''s there > - next time we read it, no error, since the data has zero''s in
that location.

I don''t believe GP faults and uncorrectable errors really overlap that
much.
In a GP fault the extent of the damage is known - you tried to read from
an address not in your address space, you lacked permissions for an operation
etc.  In an uncorrected error situation it is difficult to understand the
bounds of the problem in that way - unless the hardware assists with
data poisoning etc such errors may well be unconstrained and affect
a wider area than just the bracket of code that caught a GP fault.

You can often ring-fence critical code sequences by inserting error
barrier instructions before and after it.  Those operations are
usually very expensive (drain the pipeline or similar) and are
suitable only in special places.

When running natively it is usually the "owner" of affected data that
sees it bad in memory, eg from a read it made.  In those cases we
have the owner on cpu and can kill/signal it synchronously.
There are times when the kernel may be shifting some data
on behalf of the application owner (eg, copyin/copyout, shift
network data etc) in which case we still have a handle on the
real owner.  If the access is from a scrub then we should not
panic - just wait and see if the owner does indeed use the bad data
at which time we take appropriate action.

With the virtualisation layer there is the additional case of the HV or
dom0 performing operations on behalf of a guest, ie the HV may make the
access that traps but it''s own state is not affected.

CPU errors get still trickier.  For example what do we do when we''re
told that
while running guest A we displaced modified data from l2cache that had
uncorrectable ECC?  We have a physical address only, and no idea of who the
data belongs to (guest A, a recently scheduled guest, or the HV?).  Where
cachelines are tagged with some form of context or guest ID you have
a chance, provided that is reported in the error state.
> Also consider the case where one cell (or small block of cells) has gone
bad, > but it''s only used by one single piece of code that is using this
try/catch code?
 > I know, this is probably relatively rare, but I''m still worried
that it will "break" things...
>>> I''m not sure if Linux, Solaris, *BSD, OS/2 or other
OS''s will allow
>>> "catching" a Kernel GP fault in a non-precise fashion (I 
>> know Linux has
>>> exception handling for EXACT positions in the code). But 
>> since at least one
>>> kernel DOES allow this, we can''t be sure that a GPF will 
>> destroy the guest.
>>
>> When Linux and *BSD see a GPF while they are in userspace, 
>> then they kill
>> the process with a SIGSEGV. If they are in kernelspace, then 
>> they panic.
Solaris has some wrappers that can be applied, maybe at some expense to
performance, to make protected accesses that will catch and
survive various types of error including hardware errors,
wild pointers etc.
>>> Second point to note is of course that if the guest is in 
>> user-mode when
>>> the GPF happens, then almost all OS''s will just kill the 
>> application - and
>>> there''s absolutely no reason to believe that the 
>> application running is
>>> necessarily where the actual memory problem is - it may be 
>> caused by memory
>>> scrubbing for example.
Yes, these are the myriad permutations I was alluding to above.
>>> Whatever we do to the guest, it should be a "certain 
>> death", unless the
Yes, certain and instant death unless it is a PV guest that has registered
the ability to deal with these more elegantly.
>>> kernel has told us "I can handle MCE''s".
>> It is obvious that there is no absolute generic way to handle 
>> all sort of 
>> buggy guests. I vote for:
>>
>> If DomU has a PV MCA driver use this or inject a GPF.
>> Multiplexing all the MSR''s related to emulate MCA/MCE for the 
>> guests is much
>> more complex than just injecting a GPF - and slower.
Do we need to send the non-PV guest a signal of any kind to kill it?
After all, we can stop it running any further instructions (and perhaps
avoid the use of bad data) by deciding within the HV or dom0 simply
to abort that guest.  There is no loss to diagnosability since the
HV/dom0 combination is doing that, anyway.
> Emulating MCE to the guest wasn''t my intended alternative
suggestion. Instead,
> my idea was that if the guest hasn''t registered a "PV MCE
handler", we just
> immediately kill the domain as such - e.g similar to
"domain_crash_synchronous()". > Don''t let the guest have any chance to "do something
wrong" in the process - it''s
 > already broken, and letting it run any further will almost certainly not
help matters.
 > This may not be the prettiest solution, but then on the other hand, a
"Windows blue-screen"
 > or Linux "oops" saying GP fault happened at some random place in
the guest isn''t exactly> helping the SysAdmin understand the problem either. 
Agreed - don''t let the affected guest run one more instruction if we
can.  Sysadmins
will learn to consult dom0 diagnostics to see if they explain any sudden guest
deaths -
no need, as you say, to splurge any raw error data to them.

Gavin
> 
> --
> Mats
>> Keir, what are your opinions on this thread?
>>
>>
>> Christoph
>>
>> -- 
>> AMD Saxony, Dresden, Germany
>> Operating System Research Center
>>
>> Legal Information:
>> AMD Saxony Limited Liability Company & Co. KG
>> Sitz (Geschäftsanschrift):
>>    Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
>> Registergericht Dresden: HRA 4896
>> vertretungsberechtigter Komplementär:
>>    AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
>> Geschäftsführer der AMD Saxony LLC:
>>    Dr. Hans-R. Deppe, Thomas McCoy
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>>
>>
>>
> 
-- 
Gavin Maltby, Solaris Kernel Development.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Petersson, Mats

2007-Jun-01 11:38 UTC

head link

RE: [Xen-devel] RFC: MCA/MCE concept

> -----Original Message-----
> From: Gavin.Maltby@Sun.COM [mailto:Gavin.Maltby@Sun.COM] 
> Sent: 01 June 2007 11:57
> To: Petersson, Mats
> Cc: Egger, Christoph; xen-devel@lists.xensource.com; Keir Fraser
> Subject: Re: [Xen-devel] RFC: MCA/MCE concept
> 
> Hi
> 
> On 06/01/07 10:48, Petersson, Mats wrote:
> [cut]
> 
> >>> Note that Windows kernel drivers are allowed to use the 
> >> kernel exception
> >>> handling, and ARE allowed to "allow" GP faults if
they wish
> >> to do so.
> >>> [Don''t ask me why MS allows this, but that''s
the case, so
> >> we have to live
> >>> with it].
> >> In that case, it will die sooner or later *after* consuming 
> >> the data in error.
> >> That means, the guest continues to live for an unknown time...
> > 
> > Yes. What I''m worried about is that if you have a 
> "transient" or "few-bit"
>  > error in a rarely used, the guest may well live a LONG 
> time with incorrect
>  > data and potentially not get it detected for quite some 
> time again (say it''s
> > two bits have stuck to 0, and the data is then written back 
> with the zero''s there
>  > - next time we read it, no error, since the data has 
> zero''s in that location.
> 
> I don''t believe GP faults and uncorrectable errors really 
> overlap that much.
> In a GP fault the extent of the damage is known - you tried 
> to read from
> an address not in your address space, you lacked permissions 
> for an operation
> etc.  In an uncorrected error situation it is difficult to 
> understand the
> bounds of the problem in that way - unless the hardware assists with
> data poisoning etc such errors may well be unconstrained and affect
> a wider area than just the bracket of code that caught a GP fault.
> 
> You can often ring-fence critical code sequences by inserting error
> barrier instructions before and after it.  Those operations are
> usually very expensive (drain the pipeline or similar) and are
> suitable only in special places.
> 
> When running natively it is usually the "owner" of affected data
that
> sees it bad in memory, eg from a read it made.  In those cases we
> have the owner on cpu and can kill/signal it synchronously.
> There are times when the kernel may be shifting some data
> on behalf of the application owner (eg, copyin/copyout, shift
> network data etc) in which case we still have a handle on the
> real owner.  If the access is from a scrub then we should not
> panic - just wait and see if the owner does indeed use the bad data
> at which time we take appropriate action.
> 
> With the virtualisation layer there is the additional case of 
> the HV or
> dom0 performing operations on behalf of a guest, ie the HV 
> may make the
> access that traps but it''s own state is not affected.
> 
> CPU errors get still trickier.  For example what do we do 
> when we''re told that
> while running guest A we displaced modified data from l2cache that had
> uncorrectable ECC?  We have a physical address only, and no 
> idea of who the
> data belongs to (guest A, a recently scheduled guest, or the 
> HV?).  Where
> cachelines are tagged with some form of context or guest ID you have
> a chance, provided that is reported in the error state.
> 
> > Also consider the case where one cell (or small block of 
> cells) has gone bad,
>  > but it''s only used by one single piece of code that is 
> using this try/catch code?
>  > I know, this is probably relatively rare, but I''m still 
> worried that it will "break" things...
> 
> >>> I''m not sure if Linux, Solaris, *BSD, OS/2 or other
OS''s
> will allow
> >>> "catching" a Kernel GP fault in a non-precise
fashion (I
> >> know Linux has
> >>> exception handling for EXACT positions in the code). But 
> >> since at least one
> >>> kernel DOES allow this, we can''t be sure that a GPF
will
> >> destroy the guest.
> >>
> >> When Linux and *BSD see a GPF while they are in userspace, 
> >> then they kill
> >> the process with a SIGSEGV. If they are in kernelspace, then 
> >> they panic.
> 
> Solaris has some wrappers that can be applied, maybe at some 
> expense to
> performance, to make protected accesses that will catch and
> survive various types of error including hardware errors,
> wild pointers etc.
> 
> >>> Second point to note is of course that if the guest is in 
> >> user-mode when
> >>> the GPF happens, then almost all OS''s will just kill
the
> >> application - and
> >>> there''s absolutely no reason to believe that the 
> >> application running is
> >>> necessarily where the actual memory problem is - it may be 
> >> caused by memory
> >>> scrubbing for example.
> 
> Yes, these are the myriad permutations I was alluding to above.
> 
> >>> Whatever we do to the guest, it should be a "certain 
> >> death", unless the
> 
> Yes, certain and instant death unless it is a PV guest that 
> has registered
> the ability to deal with these more elegantly.
> 
> >>> kernel has told us "I can handle MCE''s".
> >> It is obvious that there is no absolute generic way to handle 
> >> all sort of 
> >> buggy guests. I vote for:
> >>
> >> If DomU has a PV MCA driver use this or inject a GPF.
> >> Multiplexing all the MSR''s related to emulate MCA/MCE for
the
> >> guests is much
> >> more complex than just injecting a GPF - and slower.
> 
> Do we need to send the non-PV guest a signal of any kind to kill it?
> After all, we can stop it running any further instructions 
> (and perhaps
> avoid the use of bad data) by deciding within the HV or dom0 simply
> to abort that guest.  There is no loss to diagnosability since the
> HV/dom0 combination is doing that, anyway.
No, HV can "kill" a guest without notifying the guest - in worst case,
it may need to pause the physical CPU that may still be running on the guest
(e.g. we have multiple CPU''s, one of them got "bad data", but
other CPU''s are still processing stuff). But the pausing of the CPU is
a "in hypervisor" function, so still no need to tell the guest
anything - just "Bang, you''re dead" type thing.
> 
> > Emulating MCE to the guest wasn''t my intended alternative 
> suggestion. Instead,
> > my idea was that if the guest hasn''t registered a "PV
MCE
> handler", we just 
> > immediately kill the domain as such - e.g similar to 
> "domain_crash_synchronous()".
>  > Don''t let the guest have any chance to "do something 
> wrong" in the process - it''s
>  > already broken, and letting it run any further will almost 
> certainly not help matters.
>  > This may not be the prettiest solution, but then on the 
> other hand, a "Windows blue-screen"
>  > or Linux "oops" saying GP fault happened at some random 
> place in the guest isn''t exactly
> > helping the SysAdmin understand the problem either. 
> 
> Agreed - don''t let the affected guest run one more 
> instruction if we can.  Sysadmins
> will learn to consult dom0 diagnostics to see if they explain 
> any sudden guest deaths -
> no need, as you say, to splurge any raw error data to them.
Exactly, particularly when it''s bogus raw error data, that
isn''t actually caused by the original problem.

--
Mats> 
> Gavin
> 
> > 
> > --
> > Mats
> >> Keir, what are your opinions on this thread?
> >>
> >>
> >> Christoph
> >>
> >> -- 
> >> AMD Saxony, Dresden, Germany
> >> Operating System Research Center
> >>
> >> Legal Information:
> >> AMD Saxony Limited Liability Company & Co. KG
> >> Sitz (Geschäftsanschrift):
> >>    Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
> >> Registergericht Dresden: HRA 4896
> >> vertretungsberechtigter Komplementär:
> >>    AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
> >> Geschäftsführer der AMD Saxony LLC:
> >>    Dr. Hans-R. Deppe, Thomas McCoy
> >>
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xensource.com
> >> http://lists.xensource.com/xen-devel
> >>
> >>
> >>
> > 
> 
> -- 
> Gavin Maltby, Solaris Kernel Development.
> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gavin Maltby

2007-Jun-04 16:16 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

Hi,

On 05/30/07 10:10, Christoph Egger wrote:> On Wednesday 30 May 2007 10:49:40 Jan Beulich wrote:
>>>>> "Christoph Egger" <Christoph.Egger@amd.com>
30.05.07 09:45 >>>
>>> On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:
>>>>> case I) - Xen reveives a MCE from the CPU
>>>>>
>>>>> 1) Xen MCE handler figures out if error is an correctable
error (CE)
>>>>>    or uncorrectable error (UE)
>>>>> 2a) error == CE:
>>>>>     Xen notifies Dom0 if Dom0 installed an MCA event
handler
>>>>>     for statistical purpose[rest cut]

For the hypervisor to dom0 communication that 2a) above refers to I think
we need to agree on two aspects:  what form the notification event will
take, and what error telemetry data and additional information will
be provided by the hypervisor for dom0 to chew on for statistical
and diagnosis purposes.

For the first I''ve assumed so far that an event channel notification
of the MCA event will suffice;  as long as the hypervisor only polls
for correctable MCA errors at a low-frequency rate (currently 15s interval)
there is no danger of spamming that single notification.  On
receipt of the notification the event handler will need to suck
some event data out of somewhere - uncertain which somewhere would
be best?

We should standardize both the format and the content of this event
data.  The following is just to get the conversation started in this
area.

Content first.  Obviously we need the raw MCA register content -
MCi_STATUS, MCi_ADDR, MCi_MISC.  We also need know which
MCA detector bank made the observation, so we need to include
some indication of which chip (where I use "chip" to coincide
with "socket"), core on that chip, and MCA bank number
the telemetry came from.  I think I am correct in saying that
hyperthreaded CPUs do not have any MCA banks per-thread, but we
may want to allow for that future possibility (I know, for instance,
that some SPARC cpus have error state for each hardware thread).

Such specification of error detector information clearly requires
some namespace specification.  For example if the detector identifier
could naturally come out of Xen as a (chip, core, thread, bank)
there needs to be a clear understanding of how chips, cores etc
are numbered in Xen and how dom0 that matches with how the dom0
OS has numbered these things.  If instead the detector identifier
were something like a (physical-cpu, bank) using the Xen physical-cpu
enumeration then dom0 may need a mechanism to resolve this into
chip etc info - you can''t just work with physical cpus since, for
example, a chip-shared L3 cache spans multiple physical cpus.

We should also allow for additional model-specific error telemetry
that may be available and relevant - I know that will be necessary
for some upcoming x86 cpu models.  We should probably avoid adding
"cooked" content to this error event payload - such cooking of the
raw data is much more easily performed in dom0 (the example I''m
thinking of here is physical address to memory location translation).

In terms of the form of the error event data, the simplest but also
the dumbest would be a binary structure passed from hypervisor
to dom0:

struct mca_error_data_ver1 {
	uint8_t version;	/* structure version */
	uint64_t status;
	uint64_t addr;
	uint64_t misc;
	uint16_t chip;
	uint16_t core;
	uint16_t bank;
	...
};

That is easily passed around and can be extended by versioning.
A more self-describing and naturally extensible approach would be
to parcel the error data in some form of name-type-value list.
That''s what we do in the corresponding kernel->userland error
code in Solaris; the downside is that the supporting libnvpair
library is not tiny and likely not the sort of footprint to
include in a hypervisor.  Perhaps some cut-down form would do.

Thoughts?

Gavin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-Jun-06 09:28 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On Monday 04 June 2007 18:16:56 Gavin Maltby wrote:> Hi,
>
> On 05/30/07 10:10, Christoph Egger wrote:
> > On Wednesday 30 May 2007 10:49:40 Jan Beulich wrote:
> >>>>> "Christoph Egger"
<Christoph.Egger@amd.com> 30.05.07 09:45 >>>
> >>>
> >>> On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:
> >>>>> case I) - Xen reveives a MCE from the CPU
> >>>>>
> >>>>> 1) Xen MCE handler figures out if error is an
correctable error (CE)
> >>>>>    or uncorrectable error (UE)
> >>>>> 2a) error == CE:
> >>>>>     Xen notifies Dom0 if Dom0 installed an MCA event
handler
> >>>>>     for statistical purpose
>
> [rest cut]
>
> For the hypervisor to dom0 communication that 2a) above refers to I think
> we need to agree on two aspects:  what form the notification event will
> take, and what error telemetry data and additional information will
> be provided by the hypervisor for dom0 to chew on for statistical
> and diagnosis purposes.
Additionally, the hypervisor must be able to notify domU that has
a PV MCA driver.
> For the first I''ve assumed so far that an event channel
notification
> of the MCA event will suffice;  as long as the hypervisor only polls
> for correctable MCA errors at a low-frequency rate (currently 15s interval)
> there is no danger of spamming that single notification. 
Why polling?
> On receipt of the notification the event handler will need to suck
> some event data out of somewhere - uncertain which somewhere would
> be best?
>
> We should standardize both the format and the content of this event
> data.  The following is just to get the conversation started in this
> area.
>
> Content first.  Obviously we need the raw MCA register content -
> MCi_STATUS, MCi_ADDR, MCi_MISC.  We also need know which
> MCA detector bank made the observation, so we need to include
> some indication of which chip (where I use "chip" to coincide
> with "socket"), core on that chip, and MCA bank number
> the telemetry came from.  I think I am correct in saying that
> hyperthreaded CPUs do not have any MCA banks per-thread, but we
> may want to allow for that future possibility (I know, for instance,
> that some SPARC cpus have error state for each hardware thread).
And we need the domain and the domain''s vcpu to identify
who is impacted.
> We should also allow for additional model-specific error telemetry
> that may be available and relevant - I know that will be necessary
> for some upcoming x86 cpu models.  We should probably avoid adding
> "cooked" content to this error event payload - such cooking of
the
> raw data is much more easily performed in dom0 (the example I''m
> thinking of here is physical address to memory location translation).
>
> In terms of the form of the error event data, the simplest but also
> the dumbest would be a binary structure passed from hypervisor
> to dom0:
>struct mca_error_data_ver1 {
 	uint8_t version;	/* structure version */
 	uint64_t mc_status;
 	uint64_t mc_addr;
 	uint64_t mc_misc;
 	uint16_t mc_chip;
 	uint16_t mc_core;
 	uint16_t mc_bank;
        uint16_t domid;
        uint16_t vcpu_id;
 	...
};
> That is easily passed around and can be extended by versioning.
> A more self-describing and naturally extensible approach would be
> to parcel the error data in some form of name-type-value list.
> That''s what we do in the corresponding kernel->userland error
> code in Solaris; the downside is that the supporting libnvpair
> library is not tiny and likely not the sort of footprint to
> include in a hypervisor.  Perhaps some cut-down form would do.
In the public xen.h header is a VIRQ_DOM_EXC defined, which seems
to be appropriate for an NMI event.
There are two functions to send VIRQs: send_guest_vcpu_virq() and 
send_guest_global_virq().

However, VIRQ_DOM_EXC is not properly implemented:
All virtual interrupts are maskable. We definitely need
an event that guarantees to immediately interrupts the guest, no matter
if this is Dom0 or DomU and whatever they are doing.

And VIRQ_DOM_EXC is explicitely reserved for Dom0. Maybe
we should introduce a VIRQ_MCA as a special NMI event for both Dom0 and DomU?

Christoph


-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gavin Maltby

2007-Jun-06 10:35 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

Hi,

On 06/06/07 10:28, Christoph Egger wrote:> On Monday 04 June 2007 18:16:56 Gavin Maltby wrote:
>> Hi,
>>
>> On 05/30/07 10:10, Christoph Egger wrote:
>>> On Wednesday 30 May 2007 10:49:40 Jan Beulich wrote:
>>>>>>> "Christoph Egger"
<Christoph.Egger@amd.com> 30.05.07 09:45 >>>
>>>>> On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:
>>>>>>> case I) - Xen reveives a MCE from the CPU
>>>>>>>
>>>>>>> 1) Xen MCE handler figures out if error is an
correctable error (CE)
>>>>>>>    or uncorrectable error (UE)
>>>>>>> 2a) error == CE:
>>>>>>>     Xen notifies Dom0 if Dom0 installed an MCA
event handler
>>>>>>>     for statistical purpose
>> [rest cut]
>>
>> For the hypervisor to dom0 communication that 2a) above refers to I
think
>> we need to agree on two aspects:  what form the notification event will
>> take, and what error telemetry data and additional information will
>> be provided by the hypervisor for dom0 to chew on for statistical
>> and diagnosis purposes.
> 
> Additionally, the hypervisor must be able to notify domU that has
> a PV MCA driver.
Yes, forgot that; although I guess I view that most likely as a future phase.
>> For the first I''ve assumed so far that an event channel
notification
>> of the MCA event will suffice;  as long as the hypervisor only polls
>> for correctable MCA errors at a low-frequency rate (currently 15s
interval)
>> there is no danger of spamming that single notification. 
> 
> Why polling?
Polling for correctable errors, but #MC as usual for others.  Setting
MCi_CTL bits for correctable errors does not produce a machine check,
so polling is the only approach unless one sets additional (and
undocumented, certainly for AMD chips) config bits.  What I was getting
at here is that polling at largish intervals for correctables is
the correct approach - trapping for them or polling at a high-frequency
is bad because in cases where you have some form of solid correctable
error (say a single bad pin in a dimm socket affecting one or two ranks
of that dimm but never able to produce a UE) the trap handling and
diagnosis software consume the machine and things make little useful
forward progress.
>> On receipt of the notification the event handler will need to suck
>> some event data out of somewhere - uncertain which somewhere would
>> be best?
>>
>> We should standardize both the format and the content of this event
>> data.  The following is just to get the conversation started in this
>> area.
>>
>> Content first.  Obviously we need the raw MCA register content -
>> MCi_STATUS, MCi_ADDR, MCi_MISC.  We also need know which
>> MCA detector bank made the observation, so we need to include
>> some indication of which chip (where I use "chip" to coincide
>> with "socket"), core on that chip, and MCA bank number
>> the telemetry came from.  I think I am correct in saying that
>> hyperthreaded CPUs do not have any MCA banks per-thread, but we
>> may want to allow for that future possibility (I know, for instance,
>> that some SPARC cpus have error state for each hardware thread).
> 
> And we need the domain and the domain''s vcpu to identify
> who is impacted.
Yes, the domain ID.  I''m not sure we need the vcpu id if we instead
present some physical identifiers such as chip, core number etc
(and have the namespaces well-defined).  If we don''t present those
the vcpu in the payload and some external method to resolve that to physical
components.  Since errors correlate to physical components it would, I
think, be nicer to report detector info in some physical sense.

As regards a vcpu to physical translation, I didn''t think there was any
fixed mapping (or certainly any mapping that a dom0 should interpret
and rely on).  For example if we have two physical cores but choose
to present 32 vcpus to domain I don''t believe there is anything to
say that 0-15 map always run on physical core 0?
>> We should also allow for additional model-specific error telemetry
>> that may be available and relevant - I know that will be necessary
>> for some upcoming x86 cpu models.  We should probably avoid adding
>> "cooked" content to this error event payload - such cooking
of the
>> raw data is much more easily performed in dom0 (the example
I''m
>> thinking of here is physical address to memory location translation).
>>
>> In terms of the form of the error event data, the simplest but also
>> the dumbest would be a binary structure passed from hypervisor
>> to dom0:
>>
> struct mca_error_data_ver1 {
>  	uint8_t version;	/* structure version */
>  	uint64_t mc_status;
>  	uint64_t mc_addr;
>  	uint64_t mc_misc;
>  	uint16_t mc_chip;
>  	uint16_t mc_core;
>  	uint16_t mc_bank;
>         uint16_t domid;
>         uint16_t vcpu_id;
>  	...
> };
> 
>> That is easily passed around and can be extended by versioning.
>> A more self-describing and naturally extensible approach would be
>> to parcel the error data in some form of name-type-value list.
>> That''s what we do in the corresponding kernel->userland
error
>> code in Solaris; the downside is that the supporting libnvpair
>> library is not tiny and likely not the sort of footprint to
>> include in a hypervisor.  Perhaps some cut-down form would do.
> 
> In the public xen.h header is a VIRQ_DOM_EXC defined, which seems
> to be appropriate for an NMI event.
> There are two functions to send VIRQs: send_guest_vcpu_virq() and 
> send_guest_global_virq().
> 
> However, VIRQ_DOM_EXC is not properly implemented:
> All virtual interrupts are maskable. We definitely need
> an event that guarantees to immediately interrupts the guest, no matter
> if this is Dom0 or DomU and whatever they are doing.
> 
> And VIRQ_DOM_EXC is explicitely reserved for Dom0. Maybe
> we should introduce a VIRQ_MCA as a special NMI event for both Dom0 and
DomU?
Sounds like it may be necessary.  I don''t know this mechanism very well
so I''ll go and do some reading (after a big long unrelated codereview).

Cheers

Gavin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-Jun-06 11:57 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On Wednesday 06 June 2007 12:35:15 Gavin Maltby wrote:> Hi,
>
> On 06/06/07 10:28, Christoph Egger wrote:
> > On Monday 04 June 2007 18:16:56 Gavin Maltby wrote:
> >> Hi,
> >>
> >> On 05/30/07 10:10, Christoph Egger wrote:
> >>> On Wednesday 30 May 2007 10:49:40 Jan Beulich wrote:
> >>>>>>> "Christoph Egger"
<Christoph.Egger@amd.com> 30.05.07 09:45 >>>
> >>>>>
> >>>>> On Wednesday 30 May 2007 09:19:12 Jan Beulich wrote:
> >>>>>>> case I) - Xen reveives a MCE from the CPU
> >>>>>>>
> >>>>>>> 1) Xen MCE handler figures out if error is an
correctable error
> >>>>>>> (CE) or uncorrectable error (UE)
> >>>>>>> 2a) error == CE:
> >>>>>>>     Xen notifies Dom0 if Dom0 installed an MCA
event handler
> >>>>>>>     for statistical purpose
> >>
> >> [rest cut]
> >>
> >> For the hypervisor to dom0 communication that 2a) above refers to
I
> >> think we need to agree on two aspects:  what form the notification
event
> >> will take, and what error telemetry data and additional
information will
> >> be provided by the hypervisor for dom0 to chew on for statistical
and
> >> diagnosis purposes.
> >
> > Additionally, the hypervisor must be able to notify domU that has
> > a PV MCA driver.
>
> Yes, forgot that; although I guess I view that most likely as a future
> phase.
Yes, but ignoring this can lead to a design that is bad for DomU and
requires a re-design in the worst case.
> >> For the first I''ve assumed so far that an event channel
notification
> >> of the MCA event will suffice;  as long as the hypervisor only
polls
> >> for correctable MCA errors at a low-frequency rate (currently 15s
> >> interval) there is no danger of spamming that single notification.
> >
> > Why polling?
>
> Polling for correctable errors, but #MC as usual for others.  Setting
> MCi_CTL bits for correctable errors does not produce a machine check,
> so polling is the only approach unless one sets additional (and
> undocumented, certainly for AMD chips) config bits.  What I was getting
> at here is that polling at largish intervals for correctables is
> the correct approach - trapping for them or polling at a high-frequency
> is bad because in cases where you have some form of solid correctable
> error (say a single bad pin in a dimm socket affecting one or two ranks
> of that dimm but never able to produce a UE) the trap handling and
> diagnosis software consume the machine and things make little useful
> forward progress.
I still don''t see, why #MC for all kind of errors is bad.
> >> On receipt of the notification the event handler will need to suck
> >> some event data out of somewhere - uncertain which somewhere would
> >> be best?
> >>
> >> We should standardize both the format and the content of this
event
> >> data.  The following is just to get the conversation started in
this
> >> area.
> >>
> >> Content first.  Obviously we need the raw MCA register content -
> >> MCi_STATUS, MCi_ADDR, MCi_MISC.  We also need know which
> >> MCA detector bank made the observation, so we need to include
> >> some indication of which chip (where I use "chip" to
coincide
> >> with "socket"), core on that chip, and MCA bank number
> >> the telemetry came from.  I think I am correct in saying that
> >> hyperthreaded CPUs do not have any MCA banks per-thread, but we
> >> may want to allow for that future possibility (I know, for
instance,
> >> that some SPARC cpus have error state for each hardware thread).
> >
> > And we need the domain and the domain''s vcpu to identify
> > who is impacted.
>
> Yes, the domain ID.  I''m not sure we need the vcpu id if we
instead
> present some physical identifiers such as chip, core number etc
> (and have the namespaces well-defined).  If we don''t present those
> the vcpu in the payload and some external method to resolve that to
> physical components.  Since errors correlate to physical components it
> would, I think, be nicer to report detector info in some physical sense.
The vcpu is more interesting for the domU than for dom0.
See below.
> As regards a vcpu to physical translation, I didn''t think there
was any
> fixed mapping (or certainly any mapping that a dom0 should interpret
> and rely on).  For example if we have two physical cores but choose
> to present 32 vcpus to domain I don''t believe there is anything to
> say that 0-15 map always run on physical core 0?
>
> >> We should also allow for additional model-specific error telemetry
> >> that may be available and relevant - I know that will be necessary
> >> for some upcoming x86 cpu models.  We should probably avoid adding
> >> "cooked" content to this error event payload - such
cooking of the
> >> raw data is much more easily performed in dom0 (the example
I''m
> >> thinking of here is physical address to memory location
translation).
> >>
> >> In terms of the form of the error event data, the simplest but
also
> >> the dumbest would be a binary structure passed from hypervisor
> >> to dom0:
> >
> > struct mca_error_data_ver1 {
> >  	uint8_t version;	/* structure version */
> >  	uint64_t mc_status;
> >  	uint64_t mc_addr;
> >  	uint64_t mc_misc;
> >  	uint16_t mc_chip;
> >  	uint16_t mc_core;
> >  	uint16_t mc_bank;
> >         uint16_t domid;
> >         uint16_t vcpu_id;
> >  	...
> > };
> >
> >> That is easily passed around and can be extended by versioning.
> >> A more self-describing and naturally extensible approach would be
> >> to parcel the error data in some form of name-type-value list.
> >> That''s what we do in the corresponding
kernel->userland error
> >> code in Solaris; the downside is that the supporting libnvpair
> >> library is not tiny and likely not the sort of footprint to
> >> include in a hypervisor.  Perhaps some cut-down form would do.
> >
> > In the public xen.h header is a VIRQ_DOM_EXC defined, which seems
> > to be appropriate for an NMI event.
> > There are two functions to send VIRQs: send_guest_vcpu_virq() and
> > send_guest_global_virq().
> >
> > However, VIRQ_DOM_EXC is not properly implemented:
> > All virtual interrupts are maskable. We definitely need
> > an event that guarantees to immediately interrupts the guest, no
matter
> > if this is Dom0 or DomU and whatever they are doing.
> >
> > And VIRQ_DOM_EXC is explicitely reserved for Dom0. Maybe
> > we should introduce a VIRQ_MCA as a special NMI event for both Dom0
and
> > DomU?
>
> Sounds like it may be necessary.  I don''t know this mechanism very
well
> so I''ll go and do some reading (after a big long unrelated
codereview).
After some code reading I found a nmi_pending, nmi_masked and nmi_addr in
struct vcpu in xen/include/xen/sched.h.  xen/include/xen/nmi.h is also of 
interest. The implementation is in xen/common/kernel.c.
There is only one callback per vcpu allowed and only Dom0 can register an
NMI. So the guests NMI handler must multiplex several nmi handlers - at least
for Dom0 (MCA + watchdog timer). It''s fine with me to allow DomUs to
only register the MCA NMI.

To inform domU (having a PV MCA driver), they must be able to register an
NMI callback as well. To allow this, struct vcpu_info in the PUBLIC xen.h
also needs nmi_pending and nmi_addr.


Keir: How do you feel about all this? Is this the right way or do you see
things that should be done in a different way?


Christoph


-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gavin Maltby

2007-Jun-06 12:25 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

Hi,

On 06/06/07 12:57, Christoph Egger wrote:
>>>> For the first I''ve assumed so far that an event
channel notification
>>>> of the MCA event will suffice;  as long as the hypervisor only
polls
>>>> for correctable MCA errors at a low-frequency rate (currently
15s
>>>> interval) there is no danger of spamming that single
notification.
>>> Why polling?
>> Polling for correctable errors, but #MC as usual for others.  Setting
>> MCi_CTL bits for correctable errors does not produce a machine check,
>> so polling is the only approach unless one sets additional (and
>> undocumented, certainly for AMD chips) config bits.  What I was getting
>> at here is that polling at largish intervals for correctables is
>> the correct approach - trapping for them or polling at a high-frequency
>> is bad because in cases where you have some form of solid correctable
>> error (say a single bad pin in a dimm socket affecting one or two ranks
>> of that dimm but never able to produce a UE) the trap handling and
>> diagnosis software consume the machine and things make little useful
>> forward progress.
> 
> I still don''t see, why #MC for all kind of errors is bad.
I''m talking about whether the hypervisor takes a machine check
for an error or polls for it.  We do not want #MC for correctable
errors stopping the hypervisor from making progress.  And if the
hypervisor poll interval was to small a solid error would again
keep the hypervisor busy producing (mostly/all duplicate)
error telemetry and the diagnosis code in dom0 would burn
cpu cycles, too.

How errors observed by the hypervisor, be they from #MC or from
a poll, are propogated to the domains is unimportant from this
point of view - e.g., if we decide to take error telemetry
discovered via a poll in the hypervisor and propogate it
to the domain pretending it is undistinguishable from a machine
check that will not hurt or limit the domain processing.

An untested design I had in mind, unashamedly influenced by what
we do in Solaris, was to have some common memory shared between
hypervisor and domain into which the hypervisor produces
error telemetry and the domain consumes that telemetry.
Producing and consuming is lockless using compare-and-swap.
There are two queues in this shared memory - one for uncorrectable
error telemetry and one for correctable error telemetry.  When the
domain gets whatever event to notify it of telemetry for processing
it processes the queues;  the event would be synchronous for
uncorrectable errors (ie, domain must process the telemetry
right now) or asynchronous in the case of correctable errors
(process when convenient).  The separation of CE and UE queues
stops CEs from flooding the more important UE events (you can
always drop CEs if there is no more space, but you can never
drop UEs).

[cut]
> After some code reading I found a nmi_pending, nmi_masked and nmi_addr in[cut]

Still chewing on that ...

Cheers

Gavin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-Jun-06 13:24 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On Wednesday 06 June 2007 14:25:26 Gavin Maltby wrote:> Hi,
>
> On 06/06/07 12:57, Christoph Egger wrote:
> >>>> For the first I''ve assumed so far that an event
channel notification
> >>>> of the MCA event will suffice;  as long as the hypervisor
only polls
> >>>> for correctable MCA errors at a low-frequency rate
(currently 15s
> >>>> interval) there is no danger of spamming that single
notification.
> >>>
> >>> Why polling?
> >>
> >> Polling for correctable errors, but #MC as usual for others. 
Setting
> >> MCi_CTL bits for correctable errors does not produce a machine
check,
> >> so polling is the only approach unless one sets additional (and
> >> undocumented, certainly for AMD chips) config bits.  What I was
getting
> >> at here is that polling at largish intervals for correctables is
> >> the correct approach - trapping for them or polling at a
high-frequency
> >> is bad because in cases where you have some form of solid
correctable
> >> error (say a single bad pin in a dimm socket affecting one or two
ranks
> >> of that dimm but never able to produce a UE) the trap handling and
> >> diagnosis software consume the machine and things make little
useful
> >> forward progress.
> >
> > I still don''t see, why #MC for all kind of errors is bad.
>
> I''m talking about whether the hypervisor takes a machine check
> for an error or polls for it.  We do not want #MC for correctable
> errors stopping the hypervisor from making progress.  And if the
> hypervisor poll interval was to small a solid error would again
> keep the hypervisor busy producing (mostly/all duplicate)
> error telemetry and the diagnosis code in dom0 would burn
> cpu cycles, too.
>
> How errors observed by the hypervisor, be they from #MC or from
> a poll, are propogated to the domains is unimportant from this
> point of view - e.g., if we decide to take error telemetry
> discovered via a poll in the hypervisor and propogate it
> to the domain pretending it is undistinguishable from a machine
> check that will not hurt or limit the domain processing.
>
> An untested design I had in mind, unashamedly influenced by what
> we do in Solaris, was to have some common memory shared between
> hypervisor and domain into which the hypervisor produces
> error telemetry and the domain consumes that telemetry.
That is the struct vcpu_info in the PUBLIC xen.h. It is accessable
in the hypervisor as well as in the guest.
> Producing and consuming is lockless using compare-and-swap.
> There are two queues in this shared memory - one for uncorrectable
> error telemetry and one for correctable error telemetry.  When the
> domain gets whatever event to notify it of telemetry for processing
> it processes the queues;  the event would be synchronous for
> uncorrectable errors (ie, domain must process the telemetry
> right now) or asynchronous in the case of correctable errors
> (process when convenient).  The separation of CE and UE queues
> stops CEs from flooding the more important UE events (you can
> always drop CEs if there is no more space, but you can never
> drop UEs).
So we use the asynchronous event mechanism VIRQ_DOM_EXC to report
correctable errors to the Dom0 and the nmi stuff for uncorrectable errors to
Dom0 and DomU, right?

The fact that VIRQ_DOM_EXC is for Dom0 only doesn''t hurt here, since we
never
report CEs to DomUs.

> [cut]
>
> > After some code reading I found a nmi_pending, nmi_masked and nmi_addr
in
>
> [cut]
>
> Still chewing on that ...

Christoph


-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gavin Maltby

2007-Jun-14 11:59 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On 06/06/07 10:28, Christoph Egger wrote:> On Monday 04 June 2007 18:16:56 Gavin Maltby wrote:
>> In terms of the form of the error event data, the simplest but also
>> the dumbest would be a binary structure passed from hypervisor
>> to dom0:
>>
> struct mca_error_data_ver1 {
>  	uint8_t version;	/* structure version */
>  	uint64_t mc_status;
>  	uint64_t mc_addr;
>  	uint64_t mc_misc;
>  	uint16_t mc_chip;
>  	uint16_t mc_core;
>  	uint16_t mc_bank;
>         uint16_t domid;
>         uint16_t vcpu_id;
>  	...
> };
Since there are multiple MCA detector banks, and more than
one may have logged a valid error, we need to think about
communicating all the bank error telemetry.  This should
also allow for there being varying numbers of MCA banks in
different proccessor types.  So something like

struct {
	uint8_t version;
	uint8_t nbanks;
	uint16_t flags;
	uint16_t domid;
	uint16_t vcpud_id; /* if meaningful? */
	uint8_t chipid;
	uint8_t coreid;
	uint64_t mcg_status;
	struct {
		mc_status;
		mc_addr;
		mc_misc;
	} bank[1];
};

The bank array is actually sized as per nbanks.

I''ve added mcg_status and flags.  The latter I''d like to use
for indicators such as "this error data was artificially injected"
etc.

Gavin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Christoph Egger

2007-Jun-21 09:29 UTC

head link

Re: [Xen-devel] RFC: MCA/MCE concept

On Thursday 14 June 2007 13:59:12 Gavin Maltby wrote:> On 06/06/07 10:28, Christoph Egger wrote:
> > On Monday 04 June 2007 18:16:56 Gavin Maltby wrote:
> >> In terms of the form of the error event data, the simplest but
also
> >> the dumbest would be a binary structure passed from hypervisor
> >> to dom0:
> >
> > struct mca_error_data_ver1 {
> >  	uint8_t version;	/* structure version */
> >  	uint64_t mc_status;
> >  	uint64_t mc_addr;
> >  	uint64_t mc_misc;
> >  	uint16_t mc_chip;
> >  	uint16_t mc_core;
> >  	uint16_t mc_bank;
> >         uint16_t domid;
> >         uint16_t vcpu_id;
> >  	...
> > };
>
> Since there are multiple MCA detector banks, and more than
> one may have logged a valid error, we need to think about
> communicating all the bank error telemetry.  This should
> also allow for there being varying numbers of MCA banks in
> different proccessor types.  So something like
>
> struct {
> 	uint8_t version;
> 	uint8_t nbanks;
> 	uint16_t flags;
> 	uint16_t domid;
> 	uint16_t vcpud_id; /* if meaningful? */
> 	uint8_t chipid;
> 	uint8_t coreid;
> 	uint64_t mcg_status;
> 	struct {
> 		mc_status;
> 		mc_addr;
> 		mc_misc;
> 	} bank[1];
> };
>
> The bank array is actually sized as per nbanks.
>
> I''ve added mcg_status and flags.  The latter I''d like to
use
> for indicators such as "this error data was artificially
injected"
> etc.
Here is my proposal for a real exensible event structure:

#define MCA_TYPE_COMMON  0
#define MCA_TYPE_BANK         1
#define MCA_TYPE_ALLBANKS 2
...

#define MCA_COMMON     \
     size_t size;  /* size of this struct in bytes */
     uint32_t type; /* structure type */
     uint16_t domid;
     uint8_t chipid;
     uint8_t coreid;
     uint64_t mcg_status; /* global status */


The base structure:

struct mca_event {
     MCA_COMMON;
};


The specific structs:

struct mca_event_bank {
   MCA_COMMON;

   uint16_t vcpu_id;
   uint16_t mc_bank;
   uint64_t mc_status;
   uint64_t mc_addr;
   uint64_t mc_misc;
   uint32_t flags;
};

struct mca_event_allbanks {
   MCA_COMMON;

   uint16_t vcpud_id;
   uint8_t nbanks;
   uint32_t flags;
   struct {
 	uint64_t mc_status;
 	uint64_t mc_addr;
 	uint64_t mc_misc;
   } bank[1];
};

And you can have many more structs to support future features.

In the code you allocate the size of the struct you want to use:

     struct mca_event *mca = malloc(sizeof(struct mca_event_bank));
     mca->size = sizeof(struct mca_event_bank);
     mca->type = MCA_TYPE_BANK;

in this example you can cast from mca_event to mca_event_bank
and back whenever you like.
The generic code only needs to know struct mca_event.

Christoph


-- 
AMD Saxony, Dresden, Germany
Operating System Research Center

Legal Information:
AMD Saxony Limited Liability Company & Co. KG
Sitz (Geschäftsanschrift):
   Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
Registergericht Dresden: HRA 4896
vertretungsberechtigter Komplementär:
   AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
Geschäftsführer der AMD Saxony LLC:
   Dr. Hans-R. Deppe, Thomas McCoy



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Petersson, Mats

2007-Jun-21 10:15 UTC

head link

RE: [Xen-devel] RFC: MCA/MCE concept

> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com 
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of 
> Christoph Egger
> Sent: 21 June 2007 10:29
> To: xen-devel@lists.xensource.com
> Cc: Gavin Maltby; Jan Beulich
> Subject: Re: [Xen-devel] RFC: MCA/MCE concept
> 
> On Thursday 14 June 2007 13:59:12 Gavin Maltby wrote:
> > On 06/06/07 10:28, Christoph Egger wrote:
> > > On Monday 04 June 2007 18:16:56 Gavin Maltby wrote:
> > >> In terms of the form of the error event data, the 
> simplest but also
> > >> the dumbest would be a binary structure passed from
hypervisor
> > >> to dom0:
> > >
> > > struct mca_error_data_ver1 {
> > >  	uint8_t version;	/* structure version */
> > >  	uint64_t mc_status;
> > >  	uint64_t mc_addr;
> > >  	uint64_t mc_misc;
> > >  	uint16_t mc_chip;
> > >  	uint16_t mc_core;
> > >  	uint16_t mc_bank;
> > >         uint16_t domid;
> > >         uint16_t vcpu_id;
> > >  	...
> > > };
> >
> > Since there are multiple MCA detector banks, and more than
> > one may have logged a valid error, we need to think about
> > communicating all the bank error telemetry.  This should
> > also allow for there being varying numbers of MCA banks in
> > different proccessor types.  So something like
> >
> > struct {
> > 	uint8_t version;
> > 	uint8_t nbanks;
> > 	uint16_t flags;
> > 	uint16_t domid;
> > 	uint16_t vcpud_id; /* if meaningful? */
> > 	uint8_t chipid;
> > 	uint8_t coreid;
> > 	uint64_t mcg_status;
> > 	struct {
> > 		mc_status;
> > 		mc_addr;
> > 		mc_misc;
> > 	} bank[1];
> > };
> >
> > The bank array is actually sized as per nbanks.
> >
> > I''ve added mcg_status and flags.  The latter I''d
like to use
> > for indicators such as "this error data was artificially
injected"
> > etc.
> 
> Here is my proposal for a real exensible event structure:
> 
> #define MCA_TYPE_COMMON  0
> #define MCA_TYPE_BANK         1
> #define MCA_TYPE_ALLBANKS 2
> ...
> 
> #define MCA_COMMON     \
>      size_t size;  /* size of this struct in bytes */
>      uint32_t type; /* structure type */
>      uint16_t domid;
>      uint8_t chipid;
>      uint8_t coreid;
>      uint64_t mcg_status; /* global status */
At this point, I''d love it if gcc supported unnamed structs! [Which it
does only with -fms-extensions].

That would make this sort of thing so much neater. 
> 
> 
> The base structure:
> 
> struct mca_event {
>      MCA_COMMON;
> };
> 
> 
> The specific structs:
> 
> struct mca_event_bank {
>    MCA_COMMON;
> 
>    uint16_t vcpu_id;
>    uint16_t mc_bank;
>    uint64_t mc_status;
>    uint64_t mc_addr;
>    uint64_t mc_misc;
>    uint32_t flags;
> };
> 
> struct mca_event_allbanks {
>    MCA_COMMON;
> 
>    uint16_t vcpud_id;
>    uint8_t nbanks;
>    uint32_t flags;
>    struct {
>  	uint64_t mc_status;
>  	uint64_t mc_addr;
>  	uint64_t mc_misc;
>    } bank[1];
> };
> 
> And you can have many more structs to support future features.
> 
> In the code you allocate the size of the struct you want to use:
> 
>      struct mca_event *mca = malloc(sizeof(struct mca_event_bank));
>      mca->size = sizeof(struct mca_event_bank);
>      mca->type = MCA_TYPE_BANK;
> 
> in this example you can cast from mca_event to mca_event_bank
> and back whenever you like.
> The generic code only needs to know struct mca_event.
> 
> Christoph
> 
> 
> -- 
> AMD Saxony, Dresden, Germany
> Operating System Research Center
> 
> Legal Information:
> AMD Saxony Limited Liability Company & Co. KG
> Sitz (Geschäftsanschrift):
>    Wilschdorfer Landstr. 101, 01109 Dresden, Deutschland
> Registergericht Dresden: HRA 4896
> vertretungsberechtigter Komplementär:
>    AMD Saxony LLC (Sitz Wilmington, Delaware, USA)
> Geschäftsführer der AMD Saxony LLC:
>    Dr. Hans-R. Deppe, Thomas McCoy
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Reasonably Related Threads

Search for more possibly parallel threads

Xen devel - May 2007 - RFC: MCA/MCE concept

[Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: Re: [Xen-devel] RFC: MCA/MCE concept

RE: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

RE: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

RE: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

RE: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

Re: [Xen-devel] RFC: MCA/MCE concept

RE: [Xen-devel] RFC: MCA/MCE concept

Reasonably Related Threads