thr3ads.net - Xen devel - [Xen-devel] NMI deferral on i386 [May 2007]

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2007-May-15 14:46 UTC

[Xen-devel] NMI deferral on i386

Looking at cloning this logic to more properly support MCE, I see two issues
with this code:

- by using iret, the NMI is being acknowledged to the CPU, and since nothing
  was done to address its reason, I can''t see why it shouldn''t
re-trigger right
  after that iret (unless it was sent as an IPI)
- by re-issuing it on vector 31, the resulting interrupt will have lower
priority
  than any external interrupt, hence all pending interrupts will be serviced
  before getting to actually handle the NMI; ideally this should use the highest
  possible vector, but since priorities are grouped anyway, at least allocating
  the vector from the high priority pool would seem necessary

Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-15 15:00 UTC

head link

Re: [Xen-devel] NMI deferral on i386

On 15/5/07 15:46, "Jan Beulich" <jbeulich@novell.com> wrote:
> - by using iret, the NMI is being acknowledged to the CPU, and since
nothing
>   was done to address its reason, I can''t see why it
shouldn''t re-trigger
>   right after that iret (unless it was sent as an IPI)
Yes, it''s good enough for watchdog and oprofile. Level-triggered
external
NMIs will of course be a problem. We could possibly work around this by
masking LINT1 if we are CPU0 (and, of course, if LAPIC is enabled) and then
unmasking only at the end of real NMI handler. And of course x86/64
doesn''t
have this problem at all, and practically speaking is pretty much the only
hypervisor build that vendors seem to care about.
> - by re-issuing it on vector 31, the resulting interrupt will have lower
> priority
>   than any external interrupt, hence all pending interrupts will be
serviced
>   before getting to actually handle the NMI; ideally this should use the
> highest
>   possible vector, but since priorities are grouped anyway, at least
> allocating
>   the vector from the high priority pool would seem necessary
Yes, this is true.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-May-16 08:17 UTC

head link

Re: [Xen-devel] NMI deferral on i386

>>> Keir Fraser <keir@xensource.com> 15.05.07 17:00 >>>
>On 15/5/07 15:46, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>> - by using iret, the NMI is being acknowledged to the CPU, and since
nothing
>>   was done to address its reason, I can''t see why it
shouldn''t re-trigger
>>   right after that iret (unless it was sent as an IPI)
>
>Yes, it''s good enough for watchdog and oprofile. Level-triggered
external
>NMIs will of course be a problem. We could possibly work around this by
>masking LINT1 if we are CPU0 (and, of course, if LAPIC is enabled) and then
>unmasking only at the end of real NMI handler. And of course x86/64
doesn''t
>have this problem at all, and practically speaking is pretty much the only
>hypervisor build that vendors seem to care about.
What if we removed the deferral altogether, and made the NMI handler
store into the outer most frame (after all, selector registers have fixed
places on that frame), marking the that frame accordingly so that
overwriting the values saved this way can be avoided in the
interrupted save sequence (would be necessary only if both %ds and
%es are neither __HYPERVISOR_DS nor null [neatly avoiding special
casing the vm86 mode entry in the outer frame], and would add an extra
branch to __SAVE_ALL_PRE plus splitting the selector register stores
into moving %ds and %es into general purpose registers, testing the
flag NMI or MCE handlers may set, and storing the GPRs into the frame
if the flag was clear).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-16 08:28 UTC

head link

Re: [Xen-devel] NMI deferral on i386

On 16/5/07 09:17, "Jan Beulich" <jbeulich@novell.com> wrote:
>> Yes, it''s good enough for watchdog and oprofile.
Level-triggered external
>> NMIs will of course be a problem. We could possibly work around this by
>> masking LINT1 if we are CPU0 (and, of course, if LAPIC is enabled) and
then
>> unmasking only at the end of real NMI handler. And of course x86/64
doesn''t
>> have this problem at all, and practically speaking is pretty much the
only
>> hypervisor build that vendors seem to care about.
> 
> What if we removed the deferral altogether, and made the NMI handler
> store into the outer most frame (after all, selector registers have fixed
> places on that frame), marking the that frame accordingly so that
> overwriting the values saved this way can be avoided in the
> interrupted save sequence (would be necessary only if both %ds and
> %es are neither __HYPERVISOR_DS nor null [neatly avoiding special
> casing the vm86 mode entry in the outer frame], and would add an extra
> branch to __SAVE_ALL_PRE plus splitting the selector register stores
> into moving %ds and %es into general purpose registers, testing the
> flag NMI or MCE handlers may set, and storing the GPRs into the frame
> if the flag was clear).
It sounds a bit painful. Also it''s the exit-to-guest path that is more
of a
pain to deal with. In this case we may have restored a segment register by
the time we take the NMI. What do we do in this case about restoring the
segment register safely? Races in updating GDT/LDT may mean that the reload
still may fault, even though it didn''t just before; also we may need to
do
work in Xen (e.g., shadow-mode stuff) in interrupts-enabled context to fix
up a #GP or #PG.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-May-16 10:10 UTC

head link

Re: [Xen-devel] NMI deferral on i386

>It sounds a bit painful. Also it''s the exit-to-guest path that is
more of a
>pain to deal with. In this case we may have restored a segment register by
>the time we take the NMI. What do we do in this case about restoring the
>segment register safely? Races in updating GDT/LDT may mean that the reload
>still may fault, even though it didn''t just before; also we may
need to do
>work in Xen (e.g., shadow-mode stuff) in interrupts-enabled context to fix
>up a #GP or #PG.
Indeed. Nevertheless, for non-restartable MCEs, deferral is impossible, and
hence some mechanism would still be needed (unless we say the machine''s
going down anyway in this case and we don''t care about getting a proper
reason logged).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-16 12:32 UTC

head link

Re: [Xen-devel] NMI deferral on i386

On 16/5/07 11:10, "Jan Beulich" <jbeulich@novell.com> wrote:
>> It sounds a bit painful. Also it''s the exit-to-guest path that
is more of a
>> pain to deal with. In this case we may have restored a segment register
by
>> the time we take the NMI. What do we do in this case about restoring
the
>> segment register safely? Races in updating GDT/LDT may mean that the
reload
>> still may fault, even though it didn''t just before; also we
may need to do
>> work in Xen (e.g., shadow-mode stuff) in interrupts-enabled context to
fix
>> up a #GP or #PG.
> 
> Indeed. Nevertheless, for non-restartable MCEs, deferral is impossible, and
> hence some mechanism would still be needed (unless we say the
machine''s
> going down anyway in this case and we don''t care about getting a
proper
> reason logged).
You mean, like with a #DF, that sometimes a #MC may have bogus CS:EIP and so
you cannot IRET from it? I''m not sure how much we care about losing
these
and turning a possibly-informative crash into an ugly and confusing crash.
Personally I''ve never seen a #MC or had one reported to me, restartable
or
not. Maybe I''m lucky. :-)

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-May-16 14:19 UTC

head link

Re: [Xen-devel] NMI deferral on i386

>>> Keir Fraser <keir@xensource.com> 16.05.07 14:32 >>>
>On 16/5/07 11:10, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>>> It sounds a bit painful. Also it''s the exit-to-guest path
that is more of a
>>> pain to deal with. In this case we may have restored a segment
register by
>>> the time we take the NMI. What do we do in this case about
restoring the
>>> segment register safely? Races in updating GDT/LDT may mean that
the reload
>>> still may fault, even though it didn''t just before; also
we may need to do
>>> work in Xen (e.g., shadow-mode stuff) in interrupts-enabled context
to fix
>>> up a #GP or #PG.
>> 
>> Indeed. Nevertheless, for non-restartable MCEs, deferral is impossible,
and
>> hence some mechanism would still be needed (unless we say the
machine''s
>> going down anyway in this case and we don''t care about getting
a proper
>> reason logged).
>
>You mean, like with a #DF, that sometimes a #MC may have bogus CS:EIP and so
>you cannot IRET from it? I''m not sure how much we care about losing
these
>and turning a possibly-informative crash into an ugly and confusing crash.
Yes, that''s what I mean. And I''m not so much concerned about
turning a (very
rare) ''nice'' crash into an ''ugly'' one, but
more about that fact that until the
system actually crashes it may continue to execute for a short while, possibly
making the data corruption situation worse.
>Personally I''ve never seen a #MC or had one reported to me,
restartable or
>not. Maybe I''m lucky. :-)
I did see quite a few non-restartable ones (on native Linux), and it took me
some time to actually get the system into a state where I could see the
related messages before it rebooted. I don''t have that system anymore,
though, otherwise I might be have been able to use it for testing purposes
here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-May-21 14:01 UTC

head link

Re: [Xen-devel] NMI deferral on i386

>>> Keir Fraser <keir@xensource.com> 16.05.07 10:28 >>>
>On 16/5/07 09:17, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>>> Yes, it''s good enough for watchdog and oprofile.
Level-triggered external
>>> NMIs will of course be a problem. We could possibly work around
this by
>>> masking LINT1 if we are CPU0 (and, of course, if LAPIC is enabled)
and then
>>> unmasking only at the end of real NMI handler. And of course x86/64
doesn''t
>>> have this problem at all, and practically speaking is pretty much
the only
>>> hypervisor build that vendors seem to care about.
>> 
>> What if we removed the deferral altogether, and made the NMI handler
>> store into the outer most frame (after all, selector registers have
fixed
>> places on that frame), marking the that frame accordingly so that
>> overwriting the values saved this way can be avoided in the
>> interrupted save sequence (would be necessary only if both %ds and
>> %es are neither __HYPERVISOR_DS nor null [neatly avoiding special
>> casing the vm86 mode entry in the outer frame], and would add an extra
>> branch to __SAVE_ALL_PRE plus splitting the selector register stores
>> into moving %ds and %es into general purpose registers, testing the
>> flag NMI or MCE handlers may set, and storing the GPRs into the frame
>> if the flag was clear).
>
>It sounds a bit painful. Also it''s the exit-to-guest path that is
more of a
>pain to deal with. In this case we may have restored a segment register by
>the time we take the NMI. What do we do in this case about restoring the
>segment register safely? Races in updating GDT/LDT may mean that the reload
>still may fault, even though it didn''t just before; also we may
need to do
>work in Xen (e.g., shadow-mode stuff) in interrupts-enabled context to fix
>up a #GP or #PG.
I think I found a pretty reasonable solution, which I''m attaching in
its current
(3.1-based) form, together with the prior (untested) variant that copied the
NMI behavior. Even if it doesn''t apply to -unstable, I''d be
glad if you could
have a brief look to see whether you consider the approach too intrusive (in
which case there would be no point in trying to bring it forward to -unstable).

Jan




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-21 14:17 UTC

head link

Re: [Xen-devel] NMI deferral on i386

On 21/5/07 15:01, "Jan Beulich" <jbeulich@novell.com> wrote:
> I think I found a pretty reasonable solution, which I''m attaching
in its
> current
> (3.1-based) form, together with the prior (untested) variant that copied
the
> NMI behavior. Even if it doesn''t apply to -unstable, I''d
be glad if you could
> have a brief look to see whether you consider the approach too intrusive
(in
> which case there would be no point in trying to bring it forward to
> -unstable).
You''ll have to explain how the changes to the x86_32 entry.S work. The
rest
of the patch looks good.

 -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2007-May-21 14:34 UTC

head link

Re: [Xen-devel] NMI deferral on i386

>>> Keir Fraser <keir@xensource.com> 21.05.07 16:17 >>>
>On 21/5/07 15:01, "Jan Beulich" <jbeulich@novell.com> wrote:
>
>> I think I found a pretty reasonable solution, which I''m
attaching in its
>> current
>> (3.1-based) form, together with the prior (untested) variant that
copied the
>> NMI behavior. Even if it doesn''t apply to -unstable,
I''d be glad if you could
>> have a brief look to see whether you consider the approach too
intrusive (in
>> which case there would be no point in trying to bring it forward to
>> -unstable).
>
>You''ll have to explain how the changes to the x86_32 entry.S work.
The rest
>of the patch looks good.
The idea is to always check values read from %ds and %es against
__HYPERVISOR_DS,
and only store into the current frame (all normal handlers) or the outer-most
one (NMI and MCE) if the value read is different. That way, any NMI or MCE
occurring during frame setup will store selectors not saved so far on behalf of
the interrupted handler, with that interrupted handler either having managed
to read the guest selector (in which case it can store it regardless of whether
NMI/MCE kicked in between the read and the store) or finding __HYPERVISOR_DS
already in the register, in which case it''ll know not to store (as the
nested
handler would have done the store).

For the restore portion this makes use of the fact that there''s exactly
one
such code sequence, and by moving the selector restore part past all other
restores (including all stack pointer adjustments) the NMI/MCE handlers can
safely detect whether any selector would have been restored already (by
range checking EIP) and move EIP back to the beginning of the selector
restore sequence without having to play with the stack pointer itself or any
other gpr.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-May-23 10:03 UTC

head link

Re: [Xen-devel] NMI deferral on i386

Well, this sounds fine to me. If you port it I''ll apply it. I would
prefer
it as a separate patch from the rest of the MCA/MCE changes really, but if
that''s a pain then don''t worry about it.

 -- Keir

On 21/5/07 15:34, "Jan Beulich" <jbeulich@novell.com> wrote:
> The idea is to always check values read from %ds and %es against
> __HYPERVISOR_DS,
> and only store into the current frame (all normal handlers) or the
outer-most
> one (NMI and MCE) if the value read is different. That way, any NMI or MCE
> occurring during frame setup will store selectors not saved so far on
behalf
> of
> the interrupted handler, with that interrupted handler either having
managed
> to read the guest selector (in which case it can store it regardless of
> whether
> NMI/MCE kicked in between the read and the store) or finding
__HYPERVISOR_DS
> already in the register, in which case it''ll know not to store (as
the nested
> handler would have done the store).
> 
> For the restore portion this makes use of the fact that there''s
exactly one
> such code sequence, and by moving the selector restore part past all other
> restores (including all stack pointer adjustments) the NMI/MCE handlers can
> safely detect whether any selector would have been restored already (by
> range checking EIP) and move EIP back to the beginning of the selector
> restore sequence without having to play with the stack pointer itself or
any
> other gpr.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - May 2007 - NMI deferral on i386

[Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386

Re: [Xen-devel] NMI deferral on i386