I have an Intel e1000e NIC which I put into passthrough for an HVM domain under Xen 4.2. All the corresponding hardware protections are enabled on my system (DMA + Interrupt remapping), however, once in a while I get a SERR NMI in dom0 (NMI - PCI sys error (SERR) in xl dmesg). I am wondering about its exact reason. I am thinking in the following way: [+] Under Intel VT-x, interrupts are virtualized, thus all the interrupts coming from my HVM domain should be handled by the corresponding handler in dom0/(hypervisor?) which decides whether to neglect that interrupt or inject back into the guest to be handled by the corresponding vcpu. Thus, every time an interrupt is generated inside my HVM guest, a VM exit occurs that is handled by the hypervisor which forwards the "event" to the appropriate handler (in my case it is the NMI handler in dom0 with ring1 privileges). [+] At the same time, PCI SERR interrupts refer to hardware errors that is generated by my passthrough NIC directly, so I expect that these interrupts are physical (e.g., MSIs) so they should go directly either to the BSP or one of the APs. However, Interrupt remapping is in place which should check the origin of such interrupts and should remap the interrupts by using the BDF id of the device. Thus, the real interrupt is generated by the Interrupt Remapping hardware unit which is still a physical one. Am I right? So I have the feeling that actually it is the normal behaviour of Xen, but it is a bit weird for me that a PT device, which can also be controlled by a guest, could invoke an interrupt handler out-of-the-guest with higher privileges. Can anyone clarify this issue for me if I have a misunderstanding? Thank you! -gabor
>>> On 08.02.13 at 15:51, Gábor PÉK<pek@crysys.hu> wrote: > [+] At the same time, PCI SERR interrupts refer to hardware errors that > is generated by my passthrough NIC directly, so I expect that these > interrupts are physical (e.g., MSIs) so they should go directly either > to the BSP or one of the APs. However, Interrupt remapping is in place > which should check the origin of such interrupts and should remap the > interrupts by using the BDF id of the device. Thus, the real interrupt > is generated by the Interrupt Remapping hardware unit which is still a > physical one. Am I right?No, NMIs don't go through the remapping hardware, they get delivered directly to the CPU. Which makes sense, because they point out a problem in the system as a whole, regardless of whether a device having caused them is assigned to a guest. Note that because of the possibility of multiple devices raising such an NMI, I think it is also not possible for Xen to actually know which device(s) caused the NMI, and hence it has no way to associate it with a particular guest, even if it wanted to. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 2013.02.08. 17:55, Jan Beulich wrote:>>>> On 08.02.13 at 15:51, Gábor PÉK<pek@crysys.hu> wrote: >> [+] At the same time, PCI SERR interrupts refer to hardware errors that >> is generated by my passthrough NIC directly, so I expect that these >> interrupts are physical (e.g., MSIs) so they should go directly either >> to the BSP or one of the APs. However, Interrupt remapping is in place >> which should check the origin of such interrupts and should remap the >> interrupts by using the BDF id of the device. Thus, the real interrupt >> is generated by the Interrupt Remapping hardware unit which is still a >> physical one. Am I right? > > No, NMIs don't go through the remapping hardware, they get > delivered directly to the CPU. Which makes sense, because they > point out a problem in the system as a whole, regardless of > whether a device having caused them is assigned to a guestI faced this NMI issue while looking at the security of Xen, however, it raises security concerns in my mind. If an NMI does not go through the Interrupt Remapping engine (which makes sense due to its non-maskable nature), then a "malicious" NMI could give rise to either a DoS attack or code execution problems with ring1 privileges. In the former case the reason could be the uncleared EOI register for the specific CPU after NMI generation, while in the latter case the code injection might be difficult, but the concern is still valid I think. Furthermore, an attacker can generate such NMIs via MSIs from untrusted HVM domains by means of a PT device in xAPIC mode easily. x2APIC mode (in together with Interrupt Remapping) could give mitigation against such malicious DMA writes by accessing LAPIC registers via MSRs and enforcing the Remappable MSI format. However, if an attacker can create NMI conditions in x2apic mode as well, then the Remappable Format does not make sense at all (as the NMI is not handled by the remapping engine). So what I feel that there is no real hardware/software solution for this issue...> > Note that because of the possibility of multiple devices raising > such an NMI, I think it is also not possible for Xen to actually know > which device(s) caused the NMI, and hence it has no way to > associate it with a particular guest, even if it wanted to.Can this explain why my NMI does not appear in the /proc/interrupts in dom0 while the handler is executed with ring1 privileges? Thank you! -gabor> > Jan >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
>>> On 11.02.13 at 11:39, Gábor PÉK<pek@crysys.hu> wrote: > On 2013.02.08. 17:55, Jan Beulich wrote: >>>>> On 08.02.13 at 15:51, Gábor PÉK<pek@crysys.hu> wrote: >>> [+] At the same time, PCI SERR interrupts refer to hardware errors that >>> is generated by my passthrough NIC directly, so I expect that these >>> interrupts are physical (e.g., MSIs) so they should go directly either >>> to the BSP or one of the APs. However, Interrupt remapping is in place >>> which should check the origin of such interrupts and should remap the >>> interrupts by using the BDF id of the device. Thus, the real interrupt >>> is generated by the Interrupt Remapping hardware unit which is still a >>> physical one. Am I right? >> >> No, NMIs don't go through the remapping hardware, they get >> delivered directly to the CPU. Which makes sense, because they >> point out a problem in the system as a whole, regardless of >> whether a device having caused them is assigned to a guest > > I faced this NMI issue while looking at the security of Xen, however, it > raises security concerns in my mind. If an NMI does not go through the > Interrupt Remapping engine (which makes sense due to its non-maskable > nature), then a "malicious" NMI could give rise to either a DoS attack > or code execution problems with ring1 privileges. In the former case the > reason could be the uncleared EOI register for the specific CPU after > NMI generation, while in the latter case the code injection might be > difficult, but the concern is still valid I think.I'm afraid you're mixing up things here. An NMI doesn't require an EOI (as it's not a vectored interrupt, and as such doesn't go through the normal LAPIC processing at all; CPUs have a dedicated [virtual] input for this).> Furthermore, an attacker can generate such NMIs via MSIs from untrusted > HVM domains by means of a PT device in xAPIC mode easily.How that?> x2APIC mode > (in together with Interrupt Remapping) could give mitigation against > such malicious DMA writes by accessing LAPIC registers via MSRs and > enforcing the Remappable MSI format. However, if an attacker can create > NMI conditions in x2apic mode as well, then the Remappable Format does > not make sense at all (as the NMI is not handled by the remapping > engine). So what I feel that there is no real hardware/software solution > for this issue...There shouldn't be ways for software to cause NMIs, other than by manipulating the LAPIC directly (which only the hypervisor can) or writing malformed MSI messages (and unprivileged guests don't themselves control what address/data pair gets programmed into the respective device fields). SERR, afaik, should be raised by the device itself only for certain error conditions, and if such error conditions can be enforced on some specific device by its driver, then passing through such a device is inherently insecure (with nothing the hypervisor can do about it).>> Note that because of the possibility of multiple devices raising >> such an NMI, I think it is also not possible for Xen to actually know >> which device(s) caused the NMI, and hence it has no way to >> associate it with a particular guest, even if it wanted to. > > Can this explain why my NMI does not appear in the /proc/interrupts in > dom0 while the handler is executed with ring1 privileges?At least one NMI instance should show up in the statistics. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Reasonably Related Threads
- Xen Security Advisory 59 (CVE-2013-3495) - Intel VT-d Interrupt Remapping engines can be evaded by native NMI interrupts
- [PATCH 62/62] x86/sev-es: Add NMI state tracking
- [Patch] Add NMI Injection and Pending Support in VMX
- [PATCH V5] x86/kexec: Change NMI and MCE handling on kexec path
- [PATCH V3] vmx/nmi: Do not use self_nmi() in VMEXIT handler