thr3ads.net - Xen devel - [PATCH 0 of 3] Deal with IOMMU faults in softirq context. [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Dario Faggioli

2011-Dec-19 18:34 UTC

[PATCH 0 of 3] Deal with IOMMU faults in softirq context.

Hello everyone,

As already discussed here [1], dealing with IOMMU faults in interrupt
context may cause nasty things to happen, up to being used as a form of
DoS attack, e.g., by generating a "storm" of IOMMU faults that will
livelock a pCPU.

To avoid this, IOMMU faults handling is being moved from interrupt to
softirq context. Basically, the inerrupt handler of the IRQ originated
by an IOMMU (page) fault will raise a softirq-tasklet which will then
deal with the actual fault records by clearing the logs and re-enabling
interrupts from the offending IOMMU(s). A single tasklet is being used
even if there are more than just one IOMMU in the system, as the event
should be rare enough.

The series introduces the described mechanism for both Intel VT-d and
AMD-Vi, and has been tested on both platforms with an hacked DomU bnx2
network driver which was generating I/O page faults upon request.

Thanks and Regards,
Dario

[1]
http://old-list-archives.xen.org/archives/html/xen-devel/2011-08/msg00638.html

--
0 iommu-fault-tasklet_vtd.patch
1 iommu-fault-tasklet_amd.patch
--
 xen/drivers/passthrough/amd/iommu_init.c |  45
++++++++++++++++++++++++++++++++++++++++++---
 xen/drivers/passthrough/vtd/iommu.c      |  35
++++++++++++++++++++++++++++++++---
 2 files changed, 74 insertions(+), 6 deletions(-)

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dario Faggioli

2011-Dec-19 18:51 UTC

head link

PATCH 1 of 2] Move IOMMU faults handling into softirq for VT-d.

Dealing with interrupts from VT-d IOMMU is deferred to a
softirq-tasklet, raised by the actual IRQ handler. Since a new interrupt
is not generated, even if further faults occur, until we cleared all the
pending ones, there''s no need to disabling IRQs, as the hardware does
it
by its own. Notice that this may cause the log to overflow, but none of
the existing entry will be overwritten.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff -r a4bffc85bb71 xen/drivers/passthrough/vtd/iommu.c
--- a/xen/drivers/passthrough/vtd/iommu.c	Mon Dec 19 09:37:52 2011 +0100
+++ b/xen/drivers/passthrough/vtd/iommu.c	Mon Dec 19 16:46:14 2011 +0000
@@ -53,6 +53,8 @@ bool_t __read_mostly untrusted_msi;
 
 int nr_iommus;
 
+static struct tasklet vtd_fault_tasklet;
+
 static void setup_dom0_device(struct pci_dev *);
 static void setup_dom0_rmrr(struct domain *d);
 
@@ -918,10 +920,8 @@ static void iommu_fault_status(u32 fault
 }
 
 #define PRIMARY_FAULT_REG_LEN (16)
-static void iommu_page_fault(int irq, void *dev_id,
-                             struct cpu_user_regs *regs)
+static void __do_iommu_page_fault(struct iommu *iommu)
 {
-    struct iommu *iommu = dev_id;
     int reg, fault_index;
     u32 fault_status;
     unsigned long flags;
@@ -996,6 +996,33 @@ clear_overflow:
     }
 }
 
+static void do_iommu_page_fault(unsigned long data)
+{
+    struct acpi_drhd_unit *drhd;
+
+    if ( list_empty(&acpi_drhd_units) )
+    {
+       INTEL_IOMMU_DEBUG("no device found, something must be very
wrong!\n");
+       return;
+    }
+
+    /* No matter from whom the interrupt came from, check all the
+     * IOMMUs present in the system. This allows for having just one
+     * tasklet (instead of one per each IOMMU) and should be more than
+     * fine, considering how rare the event of a fault should be. */
+    for_each_drhd_unit ( drhd )
+        __do_iommu_page_fault(drhd->iommu);
+}
+
+static void iommu_page_fault(int irq, void *dev_id,
+                             struct cpu_user_regs *regs)
+{
+    /* Just flag the tasklet as runnable. This is fine, according to VT-d
+     * specs since a new interrupt won''t be generated until we clear
all
+     * the faults that caused this one to happen. */
+    tasklet_schedule(&vtd_fault_tasklet);
+}
+
 static void dma_msi_unmask(struct irq_desc *desc)
 {
     struct iommu *iommu = desc->action->dev_id;
@@ -2144,6 +2171,8 @@ int __init intel_vtd_setup(void)
         iommu->irq = ret;
     }
 
+    softirq_tasklet_init(&vtd_fault_tasklet, do_iommu_page_fault, 0);
+
     if ( !iommu_qinval && iommu_intremap )
     {
         iommu_intremap = 0;


-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dario Faggioli

2011-Dec-19 18:53 UTC

head link

[PATCH 2 of 2] Move IOMMU faults handling into softirq for AMD-Vi.

Dealing with interrupts from AMD-Vi IOMMU is deferred to a softirq-tasklet,
raised by the actual IRQ handler. To avoid more interrupts being generated
(because of further faults), they must be masked in the IOMMU within the
low level IRQ handler and enabled back in the tasklet body. Notice that
this may cause the log to overflow, but none of the existing entry will
be overwritten.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

diff -r 12cc8fc9a908 xen/drivers/passthrough/amd/iommu_init.c
--- a/xen/drivers/passthrough/amd/iommu_init.c	Mon Dec 19 16:46:14 2011 +0000
+++ b/xen/drivers/passthrough/amd/iommu_init.c	Mon Dec 19 16:46:39 2011 +0000
@@ -32,6 +32,8 @@
 
 static int __initdata nr_amd_iommus;
 
+static struct tasklet amd_iommu_fault_tasklet;
+
 unsigned short ivrs_bdf_entries;
 static struct radix_tree_root ivrs_maps;
 struct list_head amd_iommu_head;
@@ -522,12 +524,10 @@ static void parse_event_log_entry(struct
     }
 }
 
-static void amd_iommu_page_fault(int irq, void *dev_id,
-                             struct cpu_user_regs *regs)
+static void __do_amd_iommu_page_fault(struct amd_iommu *iommu)
 {
     u32 entry;
     unsigned long flags;
-    struct amd_iommu *iommu = dev_id;
 
     spin_lock_irqsave(&iommu->lock, flags);
     amd_iommu_read_event_log(iommu);
@@ -546,6 +546,43 @@ static void amd_iommu_page_fault(int irq
     spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
+static void do_amd_iommu_page_fault(unsigned long data)
+{
+    struct amd_iommu *iommu;
+
+    if ( list_empty(&amd_iommu_head) )
+    {
+       AMD_IOMMU_DEBUG("no device found, something must be very
wrong!\n");
+       return;
+    }
+
+    /* No matter from whom the interrupt came from, check all the
+     * IOMMUs present in the system. This allows for having just one
+     * tasklet (instead of one per each IOMMU) and should be more than
+     * fine, considering how rare the event of a fault should be. */
+for_each_amd_iommu ( iommu )
+        __do_amd_iommu_page_fault(iommu);
+}
+
+static void amd_iommu_page_fault(int irq, void *dev_id,
+                             struct cpu_user_regs *regs)
+{
+    u32 entry;
+    unsigned long flags;
+    struct amd_iommu *iommu = dev_id;
+
+    /* silence interrupts. The tasklet will enable them back */
+    spin_lock_irqsave(&iommu->lock, flags);
+    entry = readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
+    iommu_clear_bit(&entry, IOMMU_STATUS_EVENT_LOG_INT_SHIFT);
+    writel(entry, iommu->mmio_base+IOMMU_STATUS_MMIO_OFFSET);
+    spin_unlock_irqrestore(&iommu->lock, flags);
+
+    /* Flag the tasklet as runnable so that it can execute, clear
+     * the log and re-enable interrupts. */
+    tasklet_schedule(&amd_iommu_fault_tasklet);
+}
+
 static int __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
 {
     int irq, ret;
@@ -884,6 +921,8 @@ int __init amd_iommu_init(void)
         if ( amd_iommu_init_one(iommu) != 0 )
             goto error_out;
 
+    softirq_tasklet_init(&amd_iommu_fault_tasklet, do_amd_iommu_page_fault,
0);
+
     return 0;
 
 error_out:

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-------------------------------------------------------------------
Dario Faggioli, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
PhD Candidate, ReTiS Lab, Scuola Superiore Sant''Anna, Pisa (Italy)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Dec-20 09:36 UTC

head link

Re: [PATCH 0 of 3] Deal with IOMMU faults in softirq context.

>>> On 19.12.11 at 19:34, Dario Faggioli <raistlin@linux.it>
wrote:
> As already discussed here [1], dealing with IOMMU faults in interrupt
> context may cause nasty things to happen, up to being used as a form of
> DoS attack, e.g., by generating a "storm" of IOMMU faults that
will
> livelock a pCPU.
> 
> To avoid this, IOMMU faults handling is being moved from interrupt to
> softirq context. Basically, the inerrupt handler of the IRQ originated
> by an IOMMU (page) fault will raise a softirq-tasklet which will then
> deal with the actual fault records by clearing the logs and re-enabling
> interrupts from the offending IOMMU(s). A single tasklet is being used
> even if there are more than just one IOMMU in the system, as the event
> should be rare enough.
> 
> The series introduces the described mechanism for both Intel VT-d and
> AMD-Vi, and has been tested on both platforms with an hacked DomU bnx2
> network driver which was generating I/O page faults upon request.
These look good to me (apart from a minor indentation issue in the
2nd patch), but we''d surely like to have an ack from the respective
maintainers. Also, despite the subject here, I suppose the series
consists of just two patches?

Thanks for doing this!

Jan

Jan Beulich

2011-Dec-20 10:45 UTC

head link

Re: [PATCH 0 of 3] Deal with IOMMU faults in softirq context.

>>> On 20.12.11 at 11:04, Dario Faggioli <raistlin@linux.it>
wrote:
> On Tue, 2011-12-20 at 09:36 +0000, Jan Beulich wrote:=20
>> These look good to me (apart from a minor indentation issue in the
>> 2nd patch),
>>
> Oh, you mean those 3 spaces instead of 4 within
> do_amd_iommu_page_fault()? I''ve no idea of how that could have
happened
> and will fix that, thanks.
It looked like no leading space at all in my mail viewer.
> If that''s fine I''ll wait a bit more to see if other
reviews pop up and
> then resubmit the series.
> 
>> but we''d surely like to have an ack from the respective
>> maintainers.
>>
> Ok... Do I Cc-ed them correctly? :-)
Yes (except that Allen isn''t really maintaining VT-d code anymore, but
there was also no successor nominated by Intel so far; perhaps he can
find time to take a look nevertheless).

Jan

Wei Wang2

2011-Dec-20 12:11 UTC

head link

Re: [PATCH 2 of 2] Move IOMMU faults handling into softirq for AMD-Vi.

On Monday 19 December 2011 19:53:32 Dario Faggioli
wrote:> Dealing with interrupts from AMD-Vi IOMMU is deferred to a softirq-tasklet,
> raised by the actual IRQ handler. To avoid more interrupts being generated
> (because of further faults), they must be masked in the IOMMU within the
> low level IRQ handler and enabled back in the tasklet body. Notice that
> this may cause the log to overflow, but none of the existing entry will
> be overwritten.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>
> diff -r 12cc8fc9a908 xen/drivers/passthrough/amd/iommu_init.c
> --- a/xen/drivers/passthrough/amd/iommu_init.c	Mon Dec 19 16:46:14 2011
> +0000 +++ b/xen/drivers/passthrough/amd/iommu_init.c	Mon Dec 19 16:46:39
> 2011 +0000 @@ -32,6 +32,8 @@
>
>  static int __initdata nr_amd_iommus;
>
> +static struct tasklet amd_iommu_fault_tasklet;
> +
>  unsigned short ivrs_bdf_entries;
>  static struct radix_tree_root ivrs_maps;
>  struct list_head amd_iommu_head;
> @@ -522,12 +524,10 @@ static void parse_event_log_entry(struct
>      }
>  }
>
> -static void amd_iommu_page_fault(int irq, void *dev_id,
> -                             struct cpu_user_regs *regs)
> +static void __do_amd_iommu_page_fault(struct amd_iommu *iommu)
>  {
>      u32 entry;
>      unsigned long flags;
> -    struct amd_iommu *iommu = dev_id;
>
>      spin_lock_irqsave(&iommu->lock, flags);
>      amd_iommu_read_event_log(iommu);
> @@ -546,6 +546,43 @@ static void amd_iommu_page_fault(int irq
>      spin_unlock_irqrestore(&iommu->lock, flags);
>  }
>
> +static void do_amd_iommu_page_fault(unsigned long data)
> +{
> +    struct amd_iommu *iommu;
> +
> +    if ( list_empty(&amd_iommu_head) )
Here you could use iommu_found(). Rest part of this patch looks good to me.
Thanks,
Wei
> +    {
> +       AMD_IOMMU_DEBUG("no device found, something must be very
> wrong!\n"); +       return;
> +    }
> +
> +    /* No matter from whom the interrupt came from, check all the
> +     * IOMMUs present in the system. This allows for having just one
> +     * tasklet (instead of one per each IOMMU) and should be more than
> +     * fine, considering how rare the event of a fault should be. */
> +for_each_amd_iommu ( iommu )
> +        __do_amd_iommu_page_fault(iommu);
> +}
> +
> +static void amd_iommu_page_fault(int irq, void *dev_id,
> +                             struct cpu_user_regs *regs)
> +{
> +    u32 entry;
> +    unsigned long flags;
> +    struct amd_iommu *iommu = dev_id;
> +
> +    /* silence interrupts. The tasklet will enable them back */
> +    spin_lock_irqsave(&iommu->lock, flags);
> +    entry = readl(iommu->mmio_base + IOMMU_STATUS_MMIO_OFFSET);
> +    iommu_clear_bit(&entry, IOMMU_STATUS_EVENT_LOG_INT_SHIFT);
> +    writel(entry, iommu->mmio_base+IOMMU_STATUS_MMIO_OFFSET);
> +    spin_unlock_irqrestore(&iommu->lock, flags);
> +
> +    /* Flag the tasklet as runnable so that it can execute, clear
> +     * the log and re-enable interrupts. */
> +    tasklet_schedule(&amd_iommu_fault_tasklet);
> +}
> +
>  static int __init set_iommu_interrupt_handler(struct amd_iommu *iommu)
>  {
>      int irq, ret;
> @@ -884,6 +921,8 @@ int __init amd_iommu_init(void)
>          if ( amd_iommu_init_one(iommu) != 0 )
>              goto error_out;
>
> +    softirq_tasklet_init(&amd_iommu_fault_tasklet,
> do_amd_iommu_page_fault, 0); +
>      return 0;
>
>  error_out:

Xen devel - Dec 2011 - [PATCH 0 of 3] Deal with IOMMU faults in softirq context.

[PATCH 0 of 3] Deal with IOMMU faults in softirq context.

PATCH 1 of 2] Move IOMMU faults handling into softirq for VT-d.

[PATCH 2 of 2] Move IOMMU faults handling into softirq for AMD-Vi.

Re: [PATCH 0 of 3] Deal with IOMMU faults in softirq context.

Re: [PATCH 0 of 3] Deal with IOMMU faults in softirq context.

Re: [PATCH 2 of 2] Move IOMMU faults handling into softirq for AMD-Vi.