Malcolm Crossley
2012-Nov-13 20:08 UTC
[PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
The self_nmi() code cause''s an NMI to be triggered by sending an APIC message to the local processor. Unfortunately there is a delay in the delivery of the APIC message and we may already have re-entered the HVM guest by the time the NMI is taken. This results in the VMEXIT code calling the self_nmi() function again. We have seen hundreds of iterations of this VMEXIT/VMENTER loop before the HVM guest resumes normal operation. Volume 3 Chapter 27 Section 1 of the Intel SDM states: An NMI causes subsequent NMIs to be blocked, but only after the VM exit completes. So we believe it is safe to directly invoke the INT call as NMI''s should already be blocked. The INT 2 call will perform an IRET which will unblock later calls to the NMI handler, according to Intel SDM Volume 3 Chapter 6.7.1. We must ensure that the IRET from the INT 2 IRET is the first IRET issued to prevent losing an NMI. Moving the INT 2 call to before the interrupts are enabled should ensure we don''t lose the NMI. Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Acked-by: Tim Deegan <tim@xen.org> diff -r 62885b3c34c8 -r 7d6fd0219dd7 xen/arch/x86/hvm/vmx/vmx.c --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2269,6 +2269,14 @@ void vmx_vmexit_handler(struct cpu_user_ vector = intr_info & INTR_INFO_VECTOR_MASK; if ( vector == TRAP_machine_check ) do_machine_check(regs); + else if ( vector == TRAP_nmi && + ( (intr_info & INTR_INFO_INTR_TYPE_MASK) =+ (X86_EVENTTYPE_NMI << 8) ) ) + /* Must be called before interrupts are enabled to ensure + * the NMI handler code is run before the first IRET. The + * IRET unblocks subsequent NMI''s (Intel SDM Vol 3, 6.7.1) + */ + asm volatile("int $2"); /* Real NMI, vector 2: normal processing */ break; case EXIT_REASON_MCE_DURING_VMENTRY: do_machine_check(regs); @@ -2442,7 +2450,6 @@ void vmx_vmexit_handler(struct cpu_user_ (X86_EVENTTYPE_NMI << 8) ) goto exit_and_crash; HVMTRACE_0D(NMI); - self_nmi(); /* Real NMI, vector 2: normal processing. */ break; case TRAP_machine_check: HVMTRACE_0D(MCE);
Jan Beulich
2012-Nov-14 10:06 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
>>> On 13.11.12 at 21:08, Malcolm Crossley <malcolm.crossley@citrix.com> wrote: > The self_nmi() code cause''s an NMI to be triggered by sending an APIC message > to the local processor. Unfortunately there is a delay in the delivery of > the > APIC message and we may already have re-entered the HVM guest by the time the > NMI is taken. This results in the VMEXIT code calling the self_nmi() > function > again. We have seen hundreds of iterations of this VMEXIT/VMENTER loop > before > the HVM guest resumes normal operation. > > Volume 3 Chapter 27 Section 1 of the Intel SDM states: > > An NMI causes subsequent NMIs to be blocked, but only after the VM exit > completes. > > So we believe it is safe to directly invoke the INT call as NMI''s should > already be blocked. > > The INT 2 call will perform an IRET which will unblock later calls to the > NMI > handler, according to Intel SDM Volume 3 Chapter 6.7.1. We must ensure that > the > IRET from the INT 2 IRET is the first IRET issued to prevent losing an NMI. > Moving the INT 2 call to before the interrupts are enabled should ensure we > don''t lose the NMI. > > Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> > Acked-by: Tim Deegan <tim@xen.org> > > diff -r 62885b3c34c8 -r 7d6fd0219dd7 xen/arch/x86/hvm/vmx/vmx.c > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -2269,6 +2269,14 @@ void vmx_vmexit_handler(struct cpu_user_ > vector = intr_info & INTR_INFO_VECTOR_MASK; > if ( vector == TRAP_machine_check ) > do_machine_check(regs); > + else if ( vector == TRAP_nmi && > + ( (intr_info & INTR_INFO_INTR_TYPE_MASK) => + (X86_EVENTTYPE_NMI << 8) ) ) > + /* Must be called before interrupts are enabled to ensure > + * the NMI handler code is run before the first IRET. The > + * IRET unblocks subsequent NMI''s (Intel SDM Vol 3, 6.7.1) > + */ > + asm volatile("int $2"); /* Real NMI, vector 2: normal processing */And I still don''t like this use of "int $2" here: An aspect we didn''t consider so far is that a nested MCE would break things again (yes, that also is the case for NMIs arriving when in VMX root context, but this wouldn''t be the case here if we called do_nmi() and took care of the missing IRET in a place also covering the PV path, e.g. in continue_idle_domain() or reset_stack_and_jump(); we should at least keep the window where this can happen as small as possible). Jan> break; > case EXIT_REASON_MCE_DURING_VMENTRY: > do_machine_check(regs); > @@ -2442,7 +2450,6 @@ void vmx_vmexit_handler(struct cpu_user_ > (X86_EVENTTYPE_NMI << 8) ) > goto exit_and_crash; > HVMTRACE_0D(NMI); > - self_nmi(); /* Real NMI, vector 2: normal processing. */ > break; > case TRAP_machine_check: > HVMTRACE_0D(MCE); > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Tim Deegan
2012-Nov-15 16:41 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
Hi, At 10:06 +0000 on 14 Nov (1352887560), Jan Beulich wrote:> > + asm volatile("int $2"); /* Real NMI, vector 2: normal processing */ > > And I still don''t like this use of "int $2" here: An aspect we didn''t > consider so far is that a nested MCE would break things againOK, I think I understand the problem[s], but I''m going to spell it out slowly so you can correct me. :) [ tl;dr I agree that do_nmi() is better, and we should do that in this patch, but maybe we need to solve the general problem too. ] On a PV guest, we have to use dedicated stacks for NMI and MCE in case either of those things happens just before SYSRET when we''re on the user stack (no other interrupt or exception can happen at that point). On an AMD CPU we _don''t_ have dedicated stacks for NMI or MCE when we''re running a HVM guest, so the stack issue doesn''t apply (but nested NMIs are still bad). On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM guests. We don''t really have to but it saves time in the context switch not to update the IDT. Using do_nmi() here means that the first NMI is handled on the normal stack instead. It''s also consistent with the way we call do_machine_check() for the MCE case. But it needs an explicit IRET after the call to do_nmi() to make sure that NMIs get re-enabled. These dedicated stacks make the general problem of re-entrant MCE/NMI worse. In the general case those handlers don''t expect to be called in a reentrant way, but blatting the stack turns a possible problem into a definite one. --- All of this would be moot except for the risk that we might take an MCE while in the NMI handler. The IRET from the MCE handler re-enables NMIs while we''re still in the NMI handler, and a second NMI arriving could break the NMI handler. In the PV case, it will also clobber the NMI handler''s stack. In the VMX case we would need to see something like (NMI (MCE) (NMI (MCE) (NMI))) for that to happen, but it could. The inverse case, taking an NMI while in the MCE handler, is not very interesting. There''s no masking of MCEs so that handler already has to deal with nested entry, and the IRET from the NMI handler has no effect. We could potentially solve the problem by having the MCE handler check whether it''s returning to the NMI stack, and do a normal return in that case. It''s a bit of extra code but only in the MCE handler, which is not performance-critical. If we do that, then the choice of ''int $2'' vs ''do_nmi(); fake_iret()'' is mostly one of taste. do_nmi() saves an IDT indirection but unbalances the call/return stack. I slightly prefer ''int $2'' just because it makes the PV and non-PV cases more similar. But first, we should take the current fix, with do_nmi() and iret() instead of ''int $2''. The nested-MCE issue can be handled separately. Does that make sense? Tim.
Andrew Cooper
2012-Nov-15 16:52 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
On 15/11/12 16:41, Tim Deegan wrote:> Hi, > > At 10:06 +0000 on 14 Nov (1352887560), Jan Beulich wrote: >>> + asm volatile("int $2"); /* Real NMI, vector 2: normal processing */ >> And I still don''t like this use of "int $2" here: An aspect we didn''t >> consider so far is that a nested MCE would break things again > OK, I think I understand the problem[s], but I''m going to spell it out > slowly so you can correct me. :) > > [ tl;dr I agree that do_nmi() is better, and we should do that in this > patch, but maybe we need to solve the general problem too. ] > > On a PV guest, we have to use dedicated stacks for NMI and MCE in case > either of those things happens just before SYSRET when we''re on the user > stack (no other interrupt or exception can happen at that point). > > On an AMD CPU we _don''t_ have dedicated stacks for NMI or MCE when we''re > running a HVM guest, so the stack issue doesn''t apply (but nested NMIs > are still bad). > > On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM > guests. We don''t really have to but it saves time in the context switch > not to update the IDT. Using do_nmi() here means that the first NMI is > handled on the normal stack instead. It''s also consistent with the way > we call do_machine_check() for the MCE case. But it needs an explicit > IRET after the call to do_nmi() to make sure that NMIs get re-enabled. > > These dedicated stacks make the general problem of re-entrant MCE/NMI > worse. In the general case those handlers don''t expect to be called in > a reentrant way, but blatting the stack turns a possible problem into a > definite one.I have made a fairly simple patch which deliberately invokes a re-entrant NMI. The result is that a PCPU spins around the NMI handler until the watchdog takes the host down. It is also possible to get a reentrant NMI if there is a pagefault (or handful of other possible faults) when trying to execute the iret of the NMI itself; NMIs can get re-enabled from the iret of the pagefault, and we take a new NMI before attempting to retry the iret from the original NMI.> > --- > > All of this would be moot except for the risk that we might take an MCE > while in the NMI handler. The IRET from the MCE handler re-enables NMIs > while we''re still in the NMI handler, and a second NMI arriving could > break the NMI handler. In the PV case, it will also clobber the NMI > handler''s stack. In the VMX case we would need to see something like > (NMI (MCE) (NMI (MCE) (NMI))) for that to happen, but it could.There is the MCIP bit in an MCE status MSR which acts as a latch for MCEs. If a new MCE is generated while this bit is set, then a triple fault occurs. An MCE handler is required to set this bit to 0 to indicate that it has dealt with the MCE. However, there is a race condition window between setting this bit to 0 and leaving the MCE stack during which another MCE can arrive and corrupt the stack.> > The inverse case, taking an NMI while in the MCE handler, is not very > interesting. There''s no masking of MCEs so that handler already has to > deal with nested entry, and the IRET from the NMI handler has no effect. > > We could potentially solve the problem by having the MCE handler check > whether it''s returning to the NMI stack, and do a normal return in that > case. It''s a bit of extra code but only in the MCE handler, which is > not performance-critical. > > If we do that, then the choice of ''int $2'' vs ''do_nmi(); fake_iret()'' > is mostly one of taste. do_nmi() saves an IDT indirection but > unbalances the call/return stack. I slightly prefer ''int $2'' just > because it makes the PV and non-PV cases more similar. > > But first, we should take the current fix, with do_nmi() and iret() > instead of ''int $2''. The nested-MCE issue can be handled separately. > > Does that make sense?I have been looking at appling a similar fix to Linuses fix (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f3c8b8c4b2a34776c3470142a7c8baafcda6eb0) to Xen, for both the NMI and MCE stacks. Work is currently in the preliminary stages at the moment. ~Andrew> > Tim. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Mats Petersson
2012-Nov-15 17:03 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
On 15/11/12 16:41, Tim Deegan wrote:> Hi, > > At 10:06 +0000 on 14 Nov (1352887560), Jan Beulich wrote: >>> + asm volatile("int $2"); /* Real NMI, vector 2: normal processing */ >> And I still don''t like this use of "int $2" here: An aspect we didn''t >> consider so far is that a nested MCE would break things again > OK, I think I understand the problem[s], but I''m going to spell it out > slowly so you can correct me. :) > > [ tl;dr I agree that do_nmi() is better, and we should do that in this > patch, but maybe we need to solve the general problem too. ] > > On a PV guest, we have to use dedicated stacks for NMI and MCE in case > either of those things happens just before SYSRET when we''re on the user > stack (no other interrupt or exception can happen at that point). > > On an AMD CPU we _don''t_ have dedicated stacks for NMI or MCE when we''re > running a HVM guest, so the stack issue doesn''t apply (but nested NMIs > are still bad). > > On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM > guests. We don''t really have to but it saves time in the context switch > not to update the IDT. Using do_nmi() here means that the first NMI is > handled on the normal stack instead. It''s also consistent with the way > we call do_machine_check() for the MCE case. But it needs an explicit > IRET after the call to do_nmi() to make sure that NMIs get re-enabled.Both AMD and Intel has an identical setup with regard to stacks and general "what happens when we taken one of these interrupts". As Andy Cooper is about to post, there are ways to solve the nesting of either of these interrupts. There are subtle differences between Intel and AMD in how the actual interrupt gets handled when the processor is currently running a HVM guest, which is why the patch in this thread is "Intel only", because the Intel system "eats" the NMI, and it needs to be "reissued". On AMD, the interrupts are held pending until the guest has exited into the Hypervisor, and then taken after the processor executes the STGI (Set Global Interrupt Enable) instruction - analogous to what happens when you have a CLI and STI for ordinary interrupts. The issues with regards to nesting of NMI and MCE is completely different from the "how do we issue an NMI from the HVM handling code when the guest got interrupted by NMI". Nested NMI and nested MCE is the same problem on both AMD and Intel processors, and need a completely separate solution [Andy is working on this]. -- Mats
Tim Deegan
2012-Nov-15 17:15 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
At 17:03 +0000 on 15 Nov (1352998993), Mats Petersson wrote:> >On an AMD CPU we _don''t_ have dedicated stacks for NMI or MCE when we''re > >running a HVM guest, so the stack issue doesn''t apply (but nested NMIs > >are still bad). > > > >On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM > >guests. We don''t really have to but it saves time in the context switch > >not to update the IDT. Using do_nmi() here means that the first NMI is > >handled on the normal stack instead. It''s also consistent with the way > >we call do_machine_check() for the MCE case. But it needs an explicit > >IRET after the call to do_nmi() to make sure that NMIs get re-enabled. > > Both AMD and Intel has an identical setup with regard to stacks and > general "what happens when we taken one of these interrupts".My reading of svm_ctxt_switch_{to,from} makes me disagree with this. AFAICT, on SVM we''re not using dedicated stacks at all.> The issues with regards to nesting of NMI and MCE is completely > different from the "how do we issue an NMI from the HVM handling code > when the guest got interrupted by NMI".Yes. As I said, we should take the fix to the VMX NMI handling now, and sort out the nesting separately. Tim.
Tim Deegan
2012-Nov-15 17:25 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
At 16:52 +0000 on 15 Nov (1352998340), Andrew Cooper wrote:> It is also possible to get a reentrant NMI if there is a pagefault (or > handful of other possible faults) when trying to execute the iret of > the NMI itself; NMIs can get re-enabled from the iret of the > pagefault, and we take a new NMI before attempting to retry the iret > from the original NMI.Yes, I hadn''t thought of that case.> > All of this would be moot except for the risk that we might take an MCE > > while in the NMI handler. The IRET from the MCE handler re-enables NMIs > > while we''re still in the NMI handler, and a second NMI arriving could > > break the NMI handler. In the PV case, it will also clobber the NMI > > handler''s stack. In the VMX case we would need to see something like > > (NMI (MCE) (NMI (MCE) (NMI))) for that to happen, but it could. > > There is the MCIP bit in an MCE status MSR which acts as a latch for > MCEs. If a new MCE is generated while this bit is set, then a triple > fault occurs. An MCE handler is required to set this bit to 0 to > indicate that it has dealt with the MCE. However, there is a race > condition window between setting this bit to 0 and leaving the MCE stack > during which another MCE can arrive and corrupt the stack.Sure. Nested MCEs aren''t very interesting to me. If you''re taking MCEs faster than you can handle them, the difference between one of your CPUs resetting and the stack getting blasted isn''t that much. :)> I have been looking at appling a similar fix to Linuses fix > (http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3f3c8b8c4b2a34776c3470142a7c8baafcda6eb0) > to Xen, for both the NMI and MCE stacks. > > Work is currently in the preliminary stages at the moment.Sounds great! Tim.
Mats Petersson
2012-Nov-15 17:33 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
On 15/11/12 17:15, Tim Deegan wrote:> At 17:03 +0000 on 15 Nov (1352998993), Mats Petersson wrote: >>> On an AMD CPU we _don''t_ have dedicated stacks for NMI or MCE when we''re >>> running a HVM guest, so the stack issue doesn''t apply (but nested NMIs >>> are still bad). >>> >>> On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM >>> guests. We don''t really have to but it saves time in the context switch >>> not to update the IDT. Using do_nmi() here means that the first NMI is >>> handled on the normal stack instead. It''s also consistent with the way >>> we call do_machine_check() for the MCE case. But it needs an explicit >>> IRET after the call to do_nmi() to make sure that NMIs get re-enabled. >> Both AMD and Intel has an identical setup with regard to stacks and >> general "what happens when we taken one of these interrupts". > My reading of svm_ctxt_switch_{to,from} makes me disagree with this. > AFAICT, on SVM we''re not using dedicated stacks at all.In SVM, the VMRUN returns to whatever stack you had before the VMRUN. This is not what I''m talking about, however. The stack used for the NMI and MCE comes from the interrupt descriptor entry for those respective vectors. I''m fairly sure (but I haven''t followed all the code-paths to verify this) that both AMD and Intel HVM code uses the same stack when a VMEXIT happens (as in, the the value in RSP, at least nominally, the same value each time you end up in the {svm,vmx}_exit_handler(), but the value may vary from one time to another, and probably isn''t the exact same on an Intel vs. AMD comparison). The Intel solution is slightly different as to "how you end up with the RSP value you want to be at", but that''s a beside the point. Either way, what stack we are on at VMEXIT shouldn''t matter to the handling of NMI or MCE, but we should avoid using the "current" stack to handle NMI, in case it''s somehow causing further problems to do so, and we need the NMI to "get out of a problem". Similarly for MCE - if the memory used by RSP is bad, we probably don''t want to reboot the machine by using it to store the return address for MCE (although I''m not sure we can completely avoid that)... So in conclusion, the do_mce_exception() call probably should be a __asm__ __volatile__("int $X"), where X is the relevant vector. [Although I admit that when we take an MCE exception from HVM, it hopefully isn''t using the Hypervisor stack that the VMEXIT happens to use... That would be rather bad in all manner of ways, and of course, the MCE stack may also be equally bad!] -- Mats> >> The issues with regards to nesting of NMI and MCE is completely >> different from the "how do we issue an NMI from the HVM handling code >> when the guest got interrupted by NMI". > Yes. As I said, we should take the fix to the VMX NMI handling now, and > sort out the nesting separately. > > Tim. > >
Tim Deegan
2012-Nov-15 17:44 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
At 17:33 +0000 on 15 Nov (1353000782), Mats Petersson wrote:> On 15/11/12 17:15, Tim Deegan wrote: > >At 17:03 +0000 on 15 Nov (1352998993), Mats Petersson wrote: > >>>On an AMD CPU we _don''t_ have dedicated stacks for NMI or MCE when we''re > >>>running a HVM guest, so the stack issue doesn''t apply (but nested NMIs > >>>are still bad). > >>> > >>>On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM > >>>guests. We don''t really have to but it saves time in the context switch > >>>not to update the IDT. Using do_nmi() here means that the first NMI is > >>>handled on the normal stack instead. It''s also consistent with the way > >>>we call do_machine_check() for the MCE case. But it needs an explicit > >>>IRET after the call to do_nmi() to make sure that NMIs get re-enabled. > >>Both AMD and Intel has an identical setup with regard to stacks and > >>general "what happens when we taken one of these interrupts". > >My reading of svm_ctxt_switch_{to,from} makes me disagree with this. > >AFAICT, on SVM we''re not using dedicated stacks at all. > In SVM, the VMRUN returns to whatever stack you had before the VMRUN. > This is not what I''m talking about, however. The stack used for the NMI > and MCE comes from the interrupt descriptor entry for those respective > vectors.This is the code I was referring to: /* * Cannot use ISTs for NMI/#MC/#DF while we are running with the guest TR. * But this doesn''t matter: the IST is only req''d to handle SYSCALL/SYSRET. */ idt_tables[cpu][TRAP_double_fault].a &= ~(7UL << 32); idt_tables[cpu][TRAP_nmi].a &= ~(7UL << 32); idt_tables[cpu][TRAP_machine_check].a &= ~(7UL << 32); Am I misreading it?> So in conclusion, the do_mce_exception() call probably should be a > __asm__ __volatile__("int $X"), where X is the relevant vector.This handles MCEs that were raised in guest context. If we''ve managed to get this far into the exit handler, the hypervisor stack is probably OK. :) I''d be happy to invoke the MCE handler though the IDT here, just for symmetry with the other cases, but I don''t think it makes much difference. Tim.
Mats Petersson
2012-Nov-15 18:23 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
On 15/11/12 17:44, Tim Deegan wrote:> At 17:33 +0000 on 15 Nov (1353000782), Mats Petersson wrote: >> On 15/11/12 17:15, Tim Deegan wrote: >>> At 17:03 +0000 on 15 Nov (1352998993), Mats Petersson wrote: >>>>> On an AMD CPU we _don''t_ have dedicated stacks for NMI or MCE when we''re >>>>> running a HVM guest, so the stack issue doesn''t apply (but nested NMIs >>>>> are still bad). >>>>> >>>>> On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM >>>>> guests. We don''t really have to but it saves time in the context switch >>>>> not to update the IDT. Using do_nmi() here means that the first NMI is >>>>> handled on the normal stack instead. It''s also consistent with the way >>>>> we call do_machine_check() for the MCE case. But it needs an explicit >>>>> IRET after the call to do_nmi() to make sure that NMIs get re-enabled. >>>> Both AMD and Intel has an identical setup with regard to stacks and >>>> general "what happens when we taken one of these interrupts". >>> My reading of svm_ctxt_switch_{to,from} makes me disagree with this. >>> AFAICT, on SVM we''re not using dedicated stacks at all. >> In SVM, the VMRUN returns to whatever stack you had before the VMRUN. >> This is not what I''m talking about, however. The stack used for the NMI >> and MCE comes from the interrupt descriptor entry for those respective >> vectors. > This is the code I was referring to: > > /* > * Cannot use ISTs for NMI/#MC/#DF while we are running with the guest TR. > * But this doesn''t matter: the IST is only req''d to handle SYSCALL/SYSRET. > */ > idt_tables[cpu][TRAP_double_fault].a &= ~(7UL << 32); > idt_tables[cpu][TRAP_nmi].a &= ~(7UL << 32); > idt_tables[cpu][TRAP_machine_check].a &= ~(7UL << 32); > > Am I misreading it?No, you are reading it perfectly right, I''m wrong... -- Mats> >> So in conclusion, the do_mce_exception() call probably should be a >> __asm__ __volatile__("int $X"), where X is the relevant vector. > This handles MCEs that were raised in guest context. If we''ve managed > to get this far into the exit handler, the hypervisor stack is probably > OK. :) > > I''d be happy to invoke the MCE handler though the IDT here, just for > symmetry with the other cases, but I don''t think it makes much > difference. > > Tim. > >
Jan Beulich
2012-Nov-16 08:07 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
>>> On 15.11.12 at 17:41, Tim Deegan <tim@xen.org> wrote: > At 10:06 +0000 on 14 Nov (1352887560), Jan Beulich wrote: >> > + asm volatile("int $2"); /* Real NMI, vector 2: normal > processing */ >> >> And I still don''t like this use of "int $2" here: An aspect we didn''t >> consider so far is that a nested MCE would break things again > > OK, I think I understand the problem[s], but I''m going to spell it out > slowly so you can correct me. :) > > [ tl;dr I agree that do_nmi() is better, and we should do that in this > patch, but maybe we need to solve the general problem too. ] > > On a PV guest, we have to use dedicated stacks for NMI and MCE in case > either of those things happens just before SYSRET when we''re on the user > stack (no other interrupt or exception can happen at that point).Yes.> On an AMD CPU we _don''t_ have dedicated stacks for NMI or MCE when we''re > running a HVM guest, so the stack issue doesn''t apply (but nested NMIs > are still bad).Yes, albeit that''s a potential (separate) problem too (because we don''t distinguish the event having its origin in guest or hypervisor context, i.e. an MCE due to a back stack page would be handled on that same stack page, i.e. shutdown).> On an Intel CPU, we _do_ use dedicated stacks for NMI and MCE in HVM > guests. We don''t really have to but it saves time in the context switch > not to update the IDT. Using do_nmi() here means that the first NMI is > handled on the normal stack instead. It''s also consistent with the way > we call do_machine_check() for the MCE case. But it needs an explicit > IRET after the call to do_nmi() to make sure that NMIs get re-enabled.Or have the subsequent VMRESUME take care of this. There is a sentence in "26.6.1 Interruptibility State" to that effect: "NMIs are not blocked in VMX nonroot operation (except for ordinary blocking for other reasons, such as by the MOV SS instruction, the wait-for-SIPI state, etc.)", so NMIs get unblocked implicitly with the VMRESUME. Hence the only problem is with ending the NMI context when not exiting back to VMX guest. I continue to not be in favor of special casing this in VMX code, considering that the problem is generic (i.e. similarly affects PV). I.e. either we handle the other case similarly (special code added also to the PV code path), or we deal with this in a single place, keeping the NMIs masked for an extended period of time.> These dedicated stacks make the general problem of re-entrant MCE/NMI > worse. In the general case those handlers don''t expect to be called in > a reentrant way, but blatting the stack turns a possible problem into a > definite one.Yes. I think almost everyone agrees that this is a design flaw.> --- > > All of this would be moot except for the risk that we might take an MCE > while in the NMI handler. The IRET from the MCE handler re-enables NMIs > while we''re still in the NMI handler, and a second NMI arriving could > break the NMI handler. In the PV case, it will also clobber the NMI > handler''s stack.No - the entry code switches away from the dedicated stacks when the origin was in guest context (see handle_ist_exception in xen/arch/x86/x86_64/entry.S).> In the VMX case we would need to see something like > (NMI (MCE) (NMI (MCE) (NMI))) for that to happen, but it could. > > The inverse case, taking an NMI while in the MCE handler, is not very > interesting. There''s no masking of MCEs so that handler already has to > deal with nested entry, and the IRET from the NMI handler has no effect.As already pointed out by Andrew, there is an in-progress bit for this. The thing is that we''d have to switch away from the dedicated stack before clearing that bit (which shouldn''t be too difficult; if the MCE was caused by the normal stack, we shouldn''t be getting to the point of trying to exit from the MCE handler anyway).> We could potentially solve the problem by having the MCE handler check > whether it''s returning to the NMI stack, and do a normal return in that > case. It''s a bit of extra code but only in the MCE handler, which is > not performance-critical.Yes, that could solve that nesting case (again not very difficult to implement).> If we do that, then the choice of ''int $2'' vs ''do_nmi(); fake_iret()'' > is mostly one of taste. do_nmi() saves an IDT indirection but > unbalances the call/return stack. I slightly prefer ''int $2'' just > because it makes the PV and non-PV cases more similar.Once the nesting is dealt with properly, and if we decide to do the NMI-disabled-window-exit early, then yes, I agree (apart from the taste aspect, which would still tell me not to use "int $xx" in hypervisor code). And, as I think Mats said, this might then be done for the MCE the same way for consistency.> But first, we should take the current fix, with do_nmi() and iret() > instead of ''int $2''. The nested-MCE issue can be handled separately.Leaving the PV case un-addressed...> Does that make sense?Of course. Jan
Jan Beulich
2012-Nov-16 08:17 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
>>> On 15.11.12 at 18:25, Tim Deegan <tim@xen.org> wrote: > At 16:52 +0000 on 15 Nov (1352998340), Andrew Cooper wrote: >> It is also possible to get a reentrant NMI if there is a pagefault (or >> handful of other possible faults) when trying to execute the iret of >> the NMI itself; NMIs can get re-enabled from the iret of the >> pagefault, and we take a new NMI before attempting to retry the iret >> from the original NMI. > > Yes, I hadn''t thought of that case.But what would make a fault happen on that IRET? Oh, yes, there is one case - the guest having its previous instruction end exactly at the canonical/non-canonical boundary. But for the sake of correctness, that''s a #GP then. I would suppose this would better be filtered (manually injecting a #GP into the guest) than allowed to actually cause a #GP. Jan
Mats Petersson
2012-Nov-16 09:59 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
On 16/11/12 08:17, Jan Beulich wrote:>>>> On 15.11.12 at 18:25, Tim Deegan <tim@xen.org> wrote: >> At 16:52 +0000 on 15 Nov (1352998340), Andrew Cooper wrote: >>> It is also possible to get a reentrant NMI if there is a pagefault (or >>> handful of other possible faults) when trying to execute the iret of >>> the NMI itself; NMIs can get re-enabled from the iret of the >>> pagefault, and we take a new NMI before attempting to retry the iret >>> from the original NMI. >> Yes, I hadn''t thought of that case. > But what would make a fault happen on that IRET? Oh, yes, > there is one case - the guest having its previous instruction end > exactly at the canonical/non-canonical boundary. But for the > sake of correctness, that''s a #GP then. I would suppose this > would better be filtered (manually injecting a #GP into the guest) > than allowed to actually cause a #GP.Or, if for some reason the address we return to is "not present". Now, in the current Xen, Xen itself doesn''t get paged out, but in a PV guest, I''m pretty certain the guest could decide to page out some code-page, which just happens to be the one we were about to return to? -- Mats> > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > >
Keir Fraser
2012-Nov-16 10:18 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
On 16/11/2012 09:59, "Mats Petersson" <mats.petersson@citrix.com> wrote:>>> Yes, I hadn''t thought of that case. >> But what would make a fault happen on that IRET? Oh, yes, >> there is one case - the guest having its previous instruction end >> exactly at the canonical/non-canonical boundary. But for the >> sake of correctness, that''s a #GP then. I would suppose this >> would better be filtered (manually injecting a #GP into the guest) >> than allowed to actually cause a #GP. > Or, if for some reason the address we return to is "not present". Now, > in the current Xen, Xen itself doesn''t get paged out, but in a PV guest, > I''m pretty certain the guest could decide to page out some code-page, > which just happens to be the one we were about to return to?That fault would occur in the context being returned to, with rIP == the IRET return target. -- Keir
Tim Deegan
2012-Nov-16 10:56 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
At 08:07 +0000 on 16 Nov (1353053247), Jan Beulich wrote:> I continue to not be in favor of special casing this in VMX code, > considering that the problem is generic (i.e. similarly affects PV). > I.e. either we handle the other case similarly (special code added > also to the PV code path), or we deal with this in a single place, > keeping the NMIs masked for an extended period of time.Affects PV because PV might use SYSRET to return from the NMI handler? Right.> > All of this would be moot except for the risk that we might take an MCE > > while in the NMI handler. The IRET from the MCE handler re-enables NMIs > > while we''re still in the NMI handler, and a second NMI arriving could > > break the NMI handler. In the PV case, it will also clobber the NMI > > handler''s stack. > > No - the entry code switches away from the dedicated stacks when > the origin was in guest context (see handle_ist_exception in > xen/arch/x86/x86_64/entry.S).I see, thanks. So it''s only if we take the NMI while in the hypervisor that we stay on the NMI stack and risk getting the stack clobbered.> > We could potentially solve the problem by having the MCE handler check > > whether it''s returning to the NMI stack, and do a normal return in that > > case. It''s a bit of extra code but only in the MCE handler, which is > > not performance-critical. > > Yes, that could solve that nesting case (again not very difficult > to implement).How about we just have the MCE handler return without IRET in _all_ cases where it''s returning to ring 0? I think that entirely solves the MCE-in-NMI problem, without all the extra mechanism meeded for the linux-style solution. (Unless we want to allow other traps in either the NMI or MCE handlers). [And it occurs to me that the linux-style solution is tricky because detecting the case where you''ve taken an NMI and not yet set the nmi-in-progress flag is hard in both SVM (in the NMI handler but on the normal stack) and VMX (in the _vmexit_ handler and on the normal stack).] So I guess now I''m suggesting: - MCE never returns to Xen with IRET; - NMI handling in handle_vmexit() moves to beside the MCE handling; - Explicit IRET-to-self at the end of do_nmi() to unmask NMIs; and - no int $2. :) How''s that? I feel sure I must have missed a case - itt sounds too easy. Tim.
Jan Beulich
2012-Nov-16 11:23 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
>>> On 16.11.12 at 11:56, Tim Deegan <tim@xen.org> wrote: > At 08:07 +0000 on 16 Nov (1353053247), Jan Beulich wrote: >> > We could potentially solve the problem by having the MCE handler check >> > whether it''s returning to the NMI stack, and do a normal return in that >> > case. It''s a bit of extra code but only in the MCE handler, which is >> > not performance-critical. >> >> Yes, that could solve that nesting case (again not very difficult >> to implement). > > How about we just have the MCE handler return without IRET in _all_ > cases where it''s returning to ring 0? I think that entirely solves the > MCE-in-NMI problem, without all the extra mechanism meeded for the > linux-style solution.Good suggestion.> (Unless we want to allow other traps in either > the NMI or MCE handlers).We should absolutely avoid that.> [And it occurs to me that the linux-style solution is tricky because > detecting the case where you''ve taken an NMI and not yet set the > nmi-in-progress flag is hard in both SVM (in the NMI handler but on the > normal stack) and VMX (in the _vmexit_ handler and on the normal > stack).]Agreed.> So I guess now I''m suggesting: > - MCE never returns to Xen with IRET;Yes. But that might need care with regard to EFI runtime services (or maybe not, as we''re, at least at present, not switching stacks there). Nevertheless, to be on the safe side, we could restrict avoiding the IRET to "Ring 0 and RIP in hypervisor space", as we know we won''t have interrupted the NMI handler if that''s not the case.> - NMI handling in handle_vmexit() moves to beside the MCE handling;Yes.> - Explicit IRET-to-self at the end of do_nmi() to unmask NMIs; andThis IRET must then switch to the normal stack, if currently on the NMI one, which might be a little tricky/fragile. But then again we''re on the NMI one only when we got there from hypervisor context, and in that specific case we return without handling softirqs anyway. So perhaps the stack switch isn''t needed, but the IRET-to-self must only be done when the origin wasn''t in hypervisor context.> - no int $2. :)Yippee. Jan
Andrew Cooper
2012-Nov-16 11:52 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
On 16/11/12 11:23, Jan Beulich wrote:>>>> On 16.11.12 at 11:56, Tim Deegan <tim@xen.org> wrote: >> At 08:07 +0000 on 16 Nov (1353053247), Jan Beulich wrote: >>>> We could potentially solve the problem by having the MCE handler check >>>> whether it''s returning to the NMI stack, and do a normal return in that >>>> case. It''s a bit of extra code but only in the MCE handler, which is >>>> not performance-critical. >>> Yes, that could solve that nesting case (again not very difficult >>> to implement). >> How about we just have the MCE handler return without IRET in _all_ >> cases where it''s returning to ring 0? I think that entirely solves the >> MCE-in-NMI problem, without all the extra mechanism meeded for the >> linux-style solution. > Good suggestion. > >> (Unless we want to allow other traps in either >> the NMI or MCE handlers). > We should absolutely avoid that. > >> [And it occurs to me that the linux-style solution is tricky because >> detecting the case where you''ve taken an NMI and not yet set the >> nmi-in-progress flag is hard in both SVM (in the NMI handler but on the >> normal stack) and VMX (in the _vmexit_ handler and on the normal >> stack).] > Agreed.But we never need to detect this case. If we take an NMI and ensure there is no possibility for a trap before setting the nmi-in-progress flag (which is not very hard, with it being a handful of instructions in the handler), then we guarantee that NMIs are still blocked, and thus cant be reentrant. Also, for what it is worth, we do have traps on the NMI path in the form of BUG()s, WARN()s and panic gubbins, although the host is in a fairly dire state if we actually ever hit any of these. ~Andrew> >> So I guess now I''m suggesting: >> - MCE never returns to Xen with IRET; > Yes. But that might need care with regard to EFI runtime services > (or maybe not, as we''re, at least at present, not switching stacks > there). Nevertheless, to be on the safe side, we could restrict > avoiding the IRET to "Ring 0 and RIP in hypervisor space", as we > know we won''t have interrupted the NMI handler if that''s not the > case. > >> - NMI handling in handle_vmexit() moves to beside the MCE handling; > Yes. > >> - Explicit IRET-to-self at the end of do_nmi() to unmask NMIs; and > This IRET must then switch to the normal stack, if currently on > the NMI one, which might be a little tricky/fragile. > > But then again we''re on the NMI one only when we got there > from hypervisor context, and in that specific case we return > without handling softirqs anyway. So perhaps the stack switch > isn''t needed, but the IRET-to-self must only be done when the > origin wasn''t in hypervisor context. > >> - no int $2. :) > Yippee. > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Tim Deegan
2012-Nov-16 13:53 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
At 11:52 +0000 on 16 Nov (1353066779), Andrew Cooper wrote:> > On 16/11/12 11:23, Jan Beulich wrote: > >>>> On 16.11.12 at 11:56, Tim Deegan <tim@xen.org> wrote: > >> At 08:07 +0000 on 16 Nov (1353053247), Jan Beulich wrote: > >>>> We could potentially solve the problem by having the MCE handler check > >>>> whether it''s returning to the NMI stack, and do a normal return in that > >>>> case. It''s a bit of extra code but only in the MCE handler, which is > >>>> not performance-critical. > >>> Yes, that could solve that nesting case (again not very difficult > >>> to implement). > >> How about we just have the MCE handler return without IRET in _all_ > >> cases where it''s returning to ring 0? I think that entirely solves the > >> MCE-in-NMI problem, without all the extra mechanism meeded for the > >> linux-style solution. > > Good suggestion. > > > >> (Unless we want to allow other traps in either > >> the NMI or MCE handlers). > > We should absolutely avoid that. > > > >> [And it occurs to me that the linux-style solution is tricky because > >> detecting the case where you''ve taken an NMI and not yet set the > >> nmi-in-progress flag is hard in both SVM (in the NMI handler but on the > >> normal stack) and VMX (in the _vmexit_ handler and on the normal > >> stack).] > > Agreed. > > But we never need to detect this case. If we take an NMI and ensure > there is no possibility for a trap before setting the nmi-in-progress > flagThe problem is that there is no way to do that -- the trap we''re worried about is MCE, which can happen at any time. That''s why linux has the backstop check for the case where the flag''s not set but the return address is on the NMI stack.> (which is not very hard, with it being a handful of instructions in > the handler),It''s quite a bit more than that in the VMX case. I guess we need to audit that code for possible faults.> then we guarantee that NMIs are still blocked, and thus > cant be reentrant. > > Also, for what it is worth, we do have traps on the NMI path in the form > of BUG()s, WARN()s and panic gubbins, although the host is in a fairly > dire state if we actually ever hit any of these.Ergh. If there are any WARN()s we should get rid of them. BUG()s are fine. :) Tim.
Andrew Cooper
2012-Nov-16 14:11 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
On 16/11/12 13:53, Tim Deegan wrote:> At 11:52 +0000 on 16 Nov (1353066779), Andrew Cooper wrote: >> On 16/11/12 11:23, Jan Beulich wrote: >>>>>> On 16.11.12 at 11:56, Tim Deegan <tim@xen.org> wrote: >>>> At 08:07 +0000 on 16 Nov (1353053247), Jan Beulich wrote: >>>>>> We could potentially solve the problem by having the MCE handler check >>>>>> whether it''s returning to the NMI stack, and do a normal return in that >>>>>> case. It''s a bit of extra code but only in the MCE handler, which is >>>>>> not performance-critical. >>>>> Yes, that could solve that nesting case (again not very difficult >>>>> to implement). >>>> How about we just have the MCE handler return without IRET in _all_ >>>> cases where it''s returning to ring 0? I think that entirely solves the >>>> MCE-in-NMI problem, without all the extra mechanism meeded for the >>>> linux-style solution. >>> Good suggestion. >>> >>>> (Unless we want to allow other traps in either >>>> the NMI or MCE handlers). >>> We should absolutely avoid that. >>> >>>> [And it occurs to me that the linux-style solution is tricky because >>>> detecting the case where you''ve taken an NMI and not yet set the >>>> nmi-in-progress flag is hard in both SVM (in the NMI handler but on the >>>> normal stack) and VMX (in the _vmexit_ handler and on the normal >>>> stack).] >>> Agreed. >> But we never need to detect this case. If we take an NMI and ensure >> there is no possibility for a trap before setting the nmi-in-progress >> flag > The problem is that there is no way to do that -- the trap we''re worried > about is MCE, which can happen at any time. That''s why linux has the > backstop check for the case where the flag''s not set but the return > address is on the NMI stack.D''oh - your quite correct. I overlooked that possibility.> >> (which is not very hard, with it being a handful of instructions in >> the handler), > It''s quite a bit more than that in the VMX case. I guess we need to > audit that code for possible faults.But if we fix the underlying NMI/MCE reentrant problem, then faults on the vmexit patch cease to be an issue, do they not? If and when MCEs/NMIs/interrupts occur, they will be dealt with in the same manor as any other interruption to hypervisor code. ~Andrew>> then we guarantee that NMIs are still blocked, and thus >> cant be reentrant. >> >> Also, for what it is worth, we do have traps on the NMI path in the form >> of BUG()s, WARN()s and panic gubbins, although the host is in a fairly >> dire state if we actually ever hit any of these. > Ergh. If there are any WARN()s we should get rid of them. BUG()s are > fine. :) > > Tim.-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Jan Beulich
2012-Nov-22 08:58 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
>>> On 13.11.12 at 21:08, Malcolm Crossley <malcolm.crossley@citrix.com> wrote: > The self_nmi() code cause''s an NMI to be triggered by sending an APIC message > to the local processor. Unfortunately there is a delay in the delivery of > the > APIC message and we may already have re-entered the HVM guest by the time the > NMI is taken. This results in the VMEXIT code calling the self_nmi() > function > again. We have seen hundreds of iterations of this VMEXIT/VMENTER loop > before > the HVM guest resumes normal operation. > > Volume 3 Chapter 27 Section 1 of the Intel SDM states: > > An NMI causes subsequent NMIs to be blocked, but only after the VM exit > completes. > > So we believe it is safe to directly invoke the INT call as NMI''s should > already be blocked. > > The INT 2 call will perform an IRET which will unblock later calls to the > NMI > handler, according to Intel SDM Volume 3 Chapter 6.7.1. We must ensure that > the > IRET from the INT 2 IRET is the first IRET issued to prevent losing an NMI. > Moving the INT 2 call to before the interrupts are enabled should ensure we > don''t lose the NMI. > > Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> > Acked-by: Tim Deegan <tim@xen.org>As (I think) we agreed to not use "int $2", and as I don''t recall having seen a v3 of this patch - is that something that can be expected any time soon? Ideally, I would want to incorporate the changes here (and hopefully also the PV issue described during the discussion) in the pending 4.2.1 (and, if applicable, 4.1.4) release(s). Jan> diff -r 62885b3c34c8 -r 7d6fd0219dd7 xen/arch/x86/hvm/vmx/vmx.c > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -2269,6 +2269,14 @@ void vmx_vmexit_handler(struct cpu_user_ > vector = intr_info & INTR_INFO_VECTOR_MASK; > if ( vector == TRAP_machine_check ) > do_machine_check(regs); > + else if ( vector == TRAP_nmi && > + ( (intr_info & INTR_INFO_INTR_TYPE_MASK) => + (X86_EVENTTYPE_NMI << 8) ) ) > + /* Must be called before interrupts are enabled to ensure > + * the NMI handler code is run before the first IRET. The > + * IRET unblocks subsequent NMI''s (Intel SDM Vol 3, 6.7.1) > + */ > + asm volatile("int $2"); /* Real NMI, vector 2: normal > processing */ > break; > case EXIT_REASON_MCE_DURING_VMENTRY: > do_machine_check(regs); > @@ -2442,7 +2450,6 @@ void vmx_vmexit_handler(struct cpu_user_ > (X86_EVENTTYPE_NMI << 8) ) > goto exit_and_crash; > HVMTRACE_0D(NMI); > - self_nmi(); /* Real NMI, vector 2: normal processing. */ > break; > case TRAP_machine_check: > HVMTRACE_0D(MCE); > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Andrew Cooper
2012-Nov-22 10:52 UTC
Re: [PATCH V2] xen: vmx: Use an INT 2 call to process real NMI''s instead of self_nmi() in VMEXIT handler
On 22/11/12 08:58, Jan Beulich wrote:>>>> On 13.11.12 at 21:08, Malcolm Crossley <malcolm.crossley@citrix.com> wrote: >> The self_nmi() code cause''s an NMI to be triggered by sending an APIC message >> to the local processor. Unfortunately there is a delay in the delivery of >> the >> APIC message and we may already have re-entered the HVM guest by the time the >> NMI is taken. This results in the VMEXIT code calling the self_nmi() >> function >> again. We have seen hundreds of iterations of this VMEXIT/VMENTER loop >> before >> the HVM guest resumes normal operation. >> >> Volume 3 Chapter 27 Section 1 of the Intel SDM states: >> >> An NMI causes subsequent NMIs to be blocked, but only after the VM exit >> completes. >> >> So we believe it is safe to directly invoke the INT call as NMI''s should >> already be blocked. >> >> The INT 2 call will perform an IRET which will unblock later calls to the >> NMI >> handler, according to Intel SDM Volume 3 Chapter 6.7.1. We must ensure that >> the >> IRET from the INT 2 IRET is the first IRET issued to prevent losing an NMI. >> Moving the INT 2 call to before the interrupts are enabled should ensure we >> don''t lose the NMI. >> >> Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> >> Acked-by: Tim Deegan <tim@xen.org> > As (I think) we agreed to not use "int $2", and as I don''t recall > having seen a v3 of this patch - is that something that can be > expected any time soon? Ideally, I would want to incorporate > the changes here (and hopefully also the PV issue described > during the discussion) in the pending 4.2.1 (and, if applicable, > 4.1.4) release(s). > > JanMalcolm is out of the office this week. I will see about respinning a v3 later today. ~Andrew> >> diff -r 62885b3c34c8 -r 7d6fd0219dd7 xen/arch/x86/hvm/vmx/vmx.c >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -2269,6 +2269,14 @@ void vmx_vmexit_handler(struct cpu_user_ >> vector = intr_info & INTR_INFO_VECTOR_MASK; >> if ( vector == TRAP_machine_check ) >> do_machine_check(regs); >> + else if ( vector == TRAP_nmi && >> + ( (intr_info & INTR_INFO_INTR_TYPE_MASK) =>> + (X86_EVENTTYPE_NMI << 8) ) ) >> + /* Must be called before interrupts are enabled to ensure >> + * the NMI handler code is run before the first IRET. The >> + * IRET unblocks subsequent NMI''s (Intel SDM Vol 3, 6.7.1) >> + */ >> + asm volatile("int $2"); /* Real NMI, vector 2: normal >> processing */ >> break; >> case EXIT_REASON_MCE_DURING_VMENTRY: >> do_machine_check(regs); >> @@ -2442,7 +2450,6 @@ void vmx_vmexit_handler(struct cpu_user_ >> (X86_EVENTTYPE_NMI << 8) ) >> goto exit_and_crash; >> HVMTRACE_0D(NMI); >> - self_nmi(); /* Real NMI, vector 2: normal processing. */ >> break; >> case TRAP_machine_check: >> HVMTRACE_0D(MCE); >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com