thr3ads.net - Xen devel - [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock [Aug 2011]

If this information is useful, please help other people find it:
Share via:

Xen.org security team

2011-Aug-12 13:27 UTC

[Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

             Xen Security Advisory CVE-2011-3131 / XSA-5

        Xen DoS using IOMMU faults from PCI-passthrough guest


ISSUE DESCRIPTION
================
A VM that controls a PCI[E] device directly can cause it to issue
DMA requests to invalid addresses.  Although these requests are
denied by the IOMMU, the hypervisor needs to handle the interrupt
and clear the error from the IOMMU, and this can be used to
live-lock a CPU and potentially hang the host.

Because this issue has already been discussed on public mailing lists,
there is no embargo on this advisory or the patches.

VULNERABLE SYSTEMS
=================
Any system where an untrusted VM is given direct control of a PCI[E] 
device is vulnerable. 

IMPACT
=====
A malicious guest administrator of a VM that has direct control of a
PCI[E] device can cause a performance degradation, and possibly hang the
host.

RESOLUTION
=========
This issue is resolved in changeset 23762:537ed3b74b3f of
xen-unstable.hg, and 23112:84e3706df07a of xen-4.1-testing.hg.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iQEcBAEBAgAGBQJORSmkAAoJEIP+FMlX6CvZYDcIAKsgu6vDOG5Lz8/DLl48N/zg
KqPzbhW1XMm1b67un5r/bsWnuS9/z/jD8PEzybqLbS8RHwKE9XoXrJqx0Xz/Z+32
oJslxQjIzESlCf20QoNlOuPp6WgbsWGWKac+UO2r2CVtyx38L9P13OyRgzRzcoOn
eFAGB0iccr0gtWXsP2eK9MHhkGNk0yS1qJoI1XPp6DefREypUTDZOVzmgOOUuR+N
1OOUsGhdNt5mKjD/9hP7qDt6gs7EbvRrD8AHI72x4Sv9toy3i8qPO7o2PJH+X9r6
KObhbxkqgSwRaLjM+CIzFlmXXwD9GHSnzPWUO6LqAQPO6QdkUCpFSXwFRdy1H/0=qeJB
-----END PGP SIGNATURE-----

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Aug-12 13:53 UTC

head link

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

>>> On 12.08.11 at 15:27, Xen.org security team
<security@xen.org> wrote:
> IMPACT
> =====> 
> A malicious guest administrator of a VM that has direct control of a
> PCI[E] device can cause a performance degradation, and possibly hang the
> host.
> 
> RESOLUTION
> =========> 
> This issue is resolved in changeset 23762:537ed3b74b3f of
> xen-unstable.hg, and 23112:84e3706df07a of xen-4.1-testing.hg.
Do you really think this helps much? Direct control of the device means
it could also (perhaps on a second vCPU) constantly re-enable the bus
mastering bit. Preventing that would need cooperation with pciback or
filtering of subsequent config space writes directly in the hypervisor
(the latter could become difficult when mmcfg is being used by Dom0
even for base accesses).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Aug-12 14:09 UTC

head link

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich
wrote:> > This issue is resolved in changeset 23762:537ed3b74b3f of
> > xen-unstable.hg, and 23112:84e3706df07a of xen-4.1-testing.hg.
> 
> Do you really think this helps much? Direct control of the device means
> it could also (perhaps on a second vCPU) constantly re-enable the bus
> mastering bit. 
That path goes through qemu/pciback, so at least lets Xen schedule the
dom0 tools.  The particular failure that this patch fixes was locking up
cpu0 so hard that it couldn''t even service softirqs, and the NMI
watchdog rebooted the machine.

But you''re right that this doesn''t really fix the non-fatal
performance
degradation.  There are probably a lot more ways that a malicious VM
with a PCI device could degrade performance, even just by aggressively
DMAing to consume bus cycles.  I think the best Xen can do is give dom0
a chance to shut down misbehaving guests.

Cheers,

Tim.

-- 
Tim Deegan <tim@xen.org>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Aug-12 14:48 UTC

head link

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

>>> On 12.08.11 at 16:09, Tim Deegan <tim@xen.org> wrote:
> At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
>> > This issue is resolved in changeset 23762:537ed3b74b3f of
>> > xen-unstable.hg, and 23112:84e3706df07a of xen-4.1-testing.hg.
>> 
>> Do you really think this helps much? Direct control of the device means
>> it could also (perhaps on a second vCPU) constantly re-enable the bus
>> mastering bit. 
> 
> That path goes through qemu/pciback, so at least lets Xen schedule the
> dom0 tools.
Are you sure? If (as said) the guest uses a second vCPU for doing the
config space accesses, I can''t see how this would save the pCPU the
fault storm is occurring on.
> The particular failure that this patch fixes was locking up
> cpu0 so hard that it couldn''t even service softirqs, and the NMI
> watchdog rebooted the machine.
Hmm, that would point at a flaw in the interrupt exit path, on which
softirqs shouldn''t be ignored.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Aug-15 09:26 UTC

head link

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich
wrote:> >>> On 12.08.11 at 16:09, Tim Deegan <tim@xen.org> wrote:
> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
> >> > This issue is resolved in changeset 23762:537ed3b74b3f of
> >> > xen-unstable.hg, and 23112:84e3706df07a of
xen-4.1-testing.hg.
> >> 
> >> Do you really think this helps much? Direct control of the device
means
> >> it could also (perhaps on a second vCPU) constantly re-enable the
bus
> >> mastering bit. 
> > 
> > That path goes through qemu/pciback, so at least lets Xen schedule the
> > dom0 tools.
> 
> Are you sure? If (as said) the guest uses a second vCPU for doing the
> config space accesses, I can''t see how this would save the pCPU
the
> fault storm is occurring on.
Hmmm.  Yes, I see what you mean.  What was your concern about
memory-mapped config registers?  That PCIback would need to be involved
somehow?
> > The particular failure that this patch fixes was locking up
> > cpu0 so hard that it couldn''t even service softirqs, and the
NMI
> > watchdog rebooted the machine.
> 
> Hmm, that would point at a flaw in the interrupt exit path, on which
> softirqs shouldn''t be ignored.
Are you suggesting that we should handle softirqs before re-enabling
interrupts?  That sounds perilous.

Tim.

-- 
Tim Deegan <tim@xen.org>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Aug-15 10:02 UTC

head link

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

>>> On 15.08.11 at 11:26, Tim Deegan <tim@xen.org> wrote:
> At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich wrote:
>> >>> On 12.08.11 at 16:09, Tim Deegan <tim@xen.org>
wrote:
>> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
>> >> > This issue is resolved in changeset 23762:537ed3b74b3f of
>> >> > xen-unstable.hg, and 23112:84e3706df07a of
xen-4.1-testing.hg.
>> >> 
>> >> Do you really think this helps much? Direct control of the
device means
>> >> it could also (perhaps on a second vCPU) constantly re-enable
the bus
>> >> mastering bit. 
>> > 
>> > That path goes through qemu/pciback, so at least lets Xen schedule
the
>> > dom0 tools.
>> 
>> Are you sure? If (as said) the guest uses a second vCPU for doing the
>> config space accesses, I can''t see how this would save the
pCPU the
>> fault storm is occurring on.
> 
> Hmmm.  Yes, I see what you mean.  What was your concern about
> memory-mapped config registers?  That PCIback would need to be involved
> somehow?
Yes, unless we want to get into the business of intercepting Dom0''s
writes to mmcfg space.
>> > The particular failure that this patch fixes was locking up
>> > cpu0 so hard that it couldn''t even service softirqs, and
the NMI
>> > watchdog rebooted the machine.
>> 
>> Hmm, that would point at a flaw in the interrupt exit path, on which
>> softirqs shouldn''t be ignored.
> 
> Are you suggesting that we should handle softirqs before re-enabling
> interrupts?  That sounds perilous.
Ah, okay, I was assuming execution would get back into the guest at
least, but you''re saying the interrupts hit right after the sti.
Indeed,
in that case there''s not much else we can do. Or maybe we could: How
about moving the whole fault handling into a softirq, and make the
low level handler just raise that one? Provided this isn''t a
performance
critical operation (and it really can''t given that now you basically
knock
the offending device in the face when one happens), having to iterate
through all IOMMUs shouldn''t be that bad.

Jan

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Aug-16 07:03 UTC

head link

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

>>> On 15.08.11 at 11:26, Tim Deegan <tim@xen.org> wrote:
> At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich wrote:
>> >>> On 12.08.11 at 16:09, Tim Deegan <tim@xen.org>
wrote:
>> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
>> >> > This issue is resolved in changeset 23762:537ed3b74b3f of
>> >> > xen-unstable.hg, and 23112:84e3706df07a of
xen-4.1-testing.hg.
>> >> 
>> >> Do you really think this helps much? Direct control of the
device means
>> >> it could also (perhaps on a second vCPU) constantly re-enable
the bus
>> >> mastering bit. 
>> > 
>> > That path goes through qemu/pciback, so at least lets Xen schedule
the
>> > dom0 tools.
>> 
>> Are you sure? If (as said) the guest uses a second vCPU for doing the
>> config space accesses, I can''t see how this would save the
pCPU the
>> fault storm is occurring on.
> 
> Hmmm.  Yes, I see what you mean.
Actually, a second vCPU may not even be needed: Since the "fault"
really is an external interrupt, if that one gets handled on a pCPU other
than the one the guest''s vCPU is running on, it could execute such a
loop even in that case.

As to yesterdays softirq-based handling thoughts - perhaps the clearing
of the bus master bit on the device should still be done in the actual IRQ
handler, while the processing of the fault records could be moved out to
a softirq.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tim Deegan

2011-Aug-16 15:06 UTC

head link

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

At 08:03 +0100 on 16 Aug (1313481813), Jan Beulich
wrote:> >>> On 15.08.11 at 11:26, Tim Deegan <tim@xen.org> wrote:
> > At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich wrote:
> >> >>> On 12.08.11 at 16:09, Tim Deegan <tim@xen.org>
wrote:
> >> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
> >> >> > This issue is resolved in changeset
23762:537ed3b74b3f of
> >> >> > xen-unstable.hg, and 23112:84e3706df07a of
xen-4.1-testing.hg.
> >> >> 
> >> >> Do you really think this helps much? Direct control of
the device means
> >> >> it could also (perhaps on a second vCPU) constantly
re-enable the bus
> >> >> mastering bit. 
> >> > 
> >> > That path goes through qemu/pciback, so at least lets Xen
schedule the
> >> > dom0 tools.
> >> 
> >> Are you sure? If (as said) the guest uses a second vCPU for doing
the
> >> config space accesses, I can''t see how this would save
the pCPU the
> >> fault storm is occurring on.
> > 
> > Hmmm.  Yes, I see what you mean.
> 
> Actually, a second vCPU may not even be needed: Since the "fault"
> really is an external interrupt, if that one gets handled on a pCPU other
> than the one the guest''s vCPU is running on, it could execute such
a
> loop even in that case.
> 
> As to yesterdays softirq-based handling thoughts - perhaps the clearing
> of the bus master bit on the device should still be done in the actual IRQ
> handler, while the processing of the fault records could be moved out to
> a softirq.
Hmmm.  I like the idea of using a softirq but in fact by the time we''ve
figured out which BDF to silence we''ve pretty much done handling the
fault.

Reading the VTd docs it looks like we can just ack the IOMMU fault
interrupt and it won''t send any more until we clear the log, so we can
leave the whole business to a softirq.  Delaying that might cause the
log to overflow, but that''s not necessarily the end of the world.
Looks like we can do the same on AMD by disabling interrupt generation
in the main handler and reenabling it in the softirq.

Is there any situation where we rally care terribly about the IOfault
logs overflowing?

Tim.

-- 
Tim Deegan <tim@xen.org>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Aug-16 15:59 UTC

head link

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

>>> On 16.08.11 at 17:06, Tim Deegan <tim@xen.org> wrote:
> At 08:03 +0100 on 16 Aug (1313481813), Jan Beulich wrote:
>> >>> On 15.08.11 at 11:26, Tim Deegan <tim@xen.org>
wrote:
>> > At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich wrote:
>> >> >>> On 12.08.11 at 16:09, Tim Deegan
<tim@xen.org> wrote:
>> >> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
>> >> >> > This issue is resolved in changeset
23762:537ed3b74b3f of
>> >> >> > xen-unstable.hg, and 23112:84e3706df07a of
xen-4.1-testing.hg.
>> >> >> 
>> >> >> Do you really think this helps much? Direct control
of the device means
>> >> >> it could also (perhaps on a second vCPU) constantly
re-enable the bus
>> >> >> mastering bit. 
>> >> > 
>> >> > That path goes through qemu/pciback, so at least lets Xen
schedule the
>> >> > dom0 tools.
>> >> 
>> >> Are you sure? If (as said) the guest uses a second vCPU for
doing the
>> >> config space accesses, I can''t see how this would
save the pCPU the
>> >> fault storm is occurring on.
>> > 
>> > Hmmm.  Yes, I see what you mean.
>> 
>> Actually, a second vCPU may not even be needed: Since the
"fault"
>> really is an external interrupt, if that one gets handled on a pCPU
other
>> than the one the guest''s vCPU is running on, it could execute
such a
>> loop even in that case.
>> 
>> As to yesterdays softirq-based handling thoughts - perhaps the clearing
>> of the bus master bit on the device should still be done in the actual
IRQ
>> handler, while the processing of the fault records could be moved out
to
>> a softirq.
> 
> Hmmm.  I like the idea of using a softirq but in fact by the time
we''ve
> figured out which BDF to silence we''ve pretty much done handling
the
> fault.
Ugly, but yes, indeed.
> Reading the VTd docs it looks like we can just ack the IOMMU fault
> interrupt and it won''t send any more until we clear the log, so we
can
> leave the whole business to a softirq.  Delaying that might cause the
> log to overflow, but that''s not necessarily the end of the world.
> Looks like we can do the same on AMD by disabling interrupt generation
> in the main handler and reenabling it in the softirq.
> 
> Is there any situation where we rally care terribly about the IOfault
> logs overflowing?
As long as older entries don''t get overwritten, I don''t think
that''s
going to be problematic. The more that we basically shut off the
offending device(s).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kay, Allen M

2011-Sep-21 00:07 UTC

head link

RE: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

Catching up on an old thread ...

If I understand correctly, the proposal is to check for VT-d faults in
do_softirq() handler.  If so, we probably don''t even need to enable
VT-d MSI interrupt at all if iommu_debug is not set, basically handling VT-d
faults with polling method.

This sounds fine to me as long as we still turn on VT-d MSI interrupt for
iommu_debug case.

Allen

-----Original Message-----
From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jan Beulich
Sent: Tuesday, August 16, 2011 8:59 AM
To: Tim Deegan
Cc: xen-devel@lists.xensource.com; Xen.org security team
Subject: Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock
>>> On 16.08.11 at 17:06, Tim Deegan <tim@xen.org> wrote:
> At 08:03 +0100 on 16 Aug (1313481813), Jan Beulich wrote:
>> >>> On 15.08.11 at 11:26, Tim Deegan <tim@xen.org>
wrote:
>> > At 15:48 +0100 on 12 Aug (1313164084), Jan Beulich wrote:
>> >> >>> On 12.08.11 at 16:09, Tim Deegan
<tim@xen.org> wrote:
>> >> > At 14:53 +0100 on 12 Aug (1313160824), Jan Beulich wrote:
>> >> >> > This issue is resolved in changeset
23762:537ed3b74b3f of
>> >> >> > xen-unstable.hg, and 23112:84e3706df07a of
xen-4.1-testing.hg.
>> >> >> 
>> >> >> Do you really think this helps much? Direct control
of the device means
>> >> >> it could also (perhaps on a second vCPU) constantly
re-enable the bus
>> >> >> mastering bit. 
>> >> > 
>> >> > That path goes through qemu/pciback, so at least lets Xen
schedule the
>> >> > dom0 tools.
>> >> 
>> >> Are you sure? If (as said) the guest uses a second vCPU for
doing the
>> >> config space accesses, I can''t see how this would
save the pCPU the
>> >> fault storm is occurring on.
>> > 
>> > Hmmm.  Yes, I see what you mean.
>> 
>> Actually, a second vCPU may not even be needed: Since the
"fault"
>> really is an external interrupt, if that one gets handled on a pCPU
other
>> than the one the guest''s vCPU is running on, it could execute
such a
>> loop even in that case.
>> 
>> As to yesterdays softirq-based handling thoughts - perhaps the clearing
>> of the bus master bit on the device should still be done in the actual
IRQ
>> handler, while the processing of the fault records could be moved out
to
>> a softirq.
> 
> Hmmm.  I like the idea of using a softirq but in fact by the time
we''ve
> figured out which BDF to silence we''ve pretty much done handling
the
> fault.
Ugly, but yes, indeed.
> Reading the VTd docs it looks like we can just ack the IOMMU fault
> interrupt and it won''t send any more until we clear the log, so we
can
> leave the whole business to a softirq.  Delaying that might cause the
> log to overflow, but that''s not necessarily the end of the world.
> Looks like we can do the same on AMD by disabling interrupt generation
> in the main handler and reenabling it in the softirq.
> 
> Is there any situation where we rally care terribly about the IOfault
> logs overflowing?
As long as older entries don''t get overwritten, I don''t think
that''s
going to be problematic. The more that we basically shut off the
offending device(s).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Sep-21 06:47 UTC

head link

RE: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

>>> On 21.09.11 at 02:07, "Kay, Allen M"
<allen.m.kay@intel.com> wrote:
> Catching up on an old thread ...
> 
> If I understand correctly, the proposal is to check for VT-d faults in 
> do_softirq() handler.  If so, we probably don''t even need to
enable VT-d MSI
> interrupt at all if iommu_debug is not set, basically handling VT-d faults 
> with polling method.
No, I don''t think switching to polling mode was implied here. My
thinking
was rather along the lines of a dedicated softirq that the interrupt
handler would raise, for the bulk of the handling to take place there.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2011 - Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

[Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

Re: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

RE: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock

RE: [Xen-devel] Xen Advisory 5 (CVE-2011-3131) IOMMU fault livelock