George Dunlap
2011-Jul-26 16:33 UTC
[Xen-devel] [PATCH 0 of 3] Avoid sharing vectors within a device when using an AMD IOMMU
The interrupt remapping tables on AMD IOMMUs index by vector only. This means that if two MSIs go through the table that are destined for different cpus, but they share the same vector, they will be redirected to the same place. (E.g., one interrupt on p5 vector 67, another interrupt on p7 vector 67; both will be redirected to the same place.) Introducing per-device interrupt mappings reduces the problem, but does not solve it completely if the same device can have multiple IRQs assigned to it, because you can get the same issue -- two different IRQs from the same device can be assigned the same vector on different cpus. This causes one of the IRQs to activated when either interrupt is triggered, and the other IRQ to never receive any interrupts. This series consists of three patches: 1: Introduce infrastructure to allow irqs to share vector maps. Any IRQs sharing the same vector map will never have vector collisions. 2: Introduce option to have per-device vector maps for MSI IRQs. 3: Automatically enable per-device vector maps when running on an AMD system with the IOMMU enabled, unless otherwise specified. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Jul-26 16:33 UTC
[Xen-devel] [PATCH 1 of 3] xen: Infrastructure to allow irqs to share vector maps
Laying the groundwork for per-device vector maps. This generic code allows any irq to point to a vector map; all irqs sharing the same vector map will avoid sharing vectors. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> diff -r e8d1c8f074ba -r 4c8f3cae5007 xen/arch/x86/io_apic.c --- a/xen/arch/x86/io_apic.c Mon Jul 25 16:43:26 2011 +0100 +++ b/xen/arch/x86/io_apic.c Tue Jul 26 17:21:23 2011 +0100 @@ -449,6 +449,11 @@ fastcall void smp_irq_move_cleanup_inter irq, vector, smp_processor_id()); __get_cpu_var(vector_irq)[vector] = -1; + if ( cfg->used_vectors ) + { + ASSERT(test_bit(vector, cfg->used_vectors)); + clear_bit(vector, cfg->used_vectors); + } cfg->move_cleanup_count--; unlock: spin_unlock(&desc->lock); diff -r e8d1c8f074ba -r 4c8f3cae5007 xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c Mon Jul 25 16:43:26 2011 +0100 +++ b/xen/arch/x86/irq.c Tue Jul 26 17:21:23 2011 +0100 @@ -108,6 +108,8 @@ static int __init __bind_irq_vector(int per_cpu(vector_irq, cpu)[vector] = irq; cfg->vector = vector; cfg->cpu_mask = online_mask; + if ( cfg->used_vectors ) + set_bit(vector, cfg->used_vectors); irq_status[irq] = IRQ_USED; if (IO_APIC_IRQ(irq)) irq_vector[irq] = vector; @@ -172,6 +174,7 @@ static void dynamic_irq_cleanup(unsigned desc->depth = 1; desc->msi_desc = NULL; desc->handler = &no_irq_type; + desc->chip_data->used_vectors=NULL; cpus_setall(desc->affinity); spin_unlock_irqrestore(&desc->lock, flags); @@ -200,6 +203,9 @@ static void __clear_irq_vector(int irq) for_each_cpu_mask(cpu, tmp_mask) per_cpu(vector_irq, cpu)[vector] = -1; + if ( cfg->used_vectors ) + clear_bit(vector, cfg->used_vectors); + cfg->vector = IRQ_VECTOR_UNASSIGNED; cpus_clear(cfg->cpu_mask); init_one_irq_status(irq); @@ -277,6 +283,7 @@ static void __init init_one_irq_cfg(stru cfg->vector = IRQ_VECTOR_UNASSIGNED; cpus_clear(cfg->cpu_mask); cpus_clear(cfg->old_cpu_mask); + cfg->used_vectors = NULL; } int __init init_irq_data(void) @@ -402,6 +409,10 @@ next: if (test_bit(vector, used_vectors)) goto next; + if (cfg->used_vectors + && test_bit(vector, cfg->used_vectors) ) + goto next; + for_each_cpu_mask(new_cpu, tmp_mask) if (per_cpu(vector_irq, new_cpu)[vector] != -1) goto next; @@ -417,6 +428,11 @@ next: per_cpu(vector_irq, new_cpu)[vector] = irq; cfg->vector = vector; cpus_copy(cfg->cpu_mask, tmp_mask); + if ( cfg->used_vectors ) + { + ASSERT(!test_bit(vector, cfg->used_vectors)); + set_bit(vector, cfg->used_vectors); + } irq_status[irq] = IRQ_USED; if (IO_APIC_IRQ(irq)) diff -r e8d1c8f074ba -r 4c8f3cae5007 xen/include/asm-x86/irq.h --- a/xen/include/asm-x86/irq.h Mon Jul 25 16:43:26 2011 +0100 +++ b/xen/include/asm-x86/irq.h Tue Jul 26 17:21:23 2011 +0100 @@ -24,11 +24,16 @@ #define irq_to_desc(irq) (&irq_desc[irq]) #define irq_cfg(irq) (&irq_cfg[irq]) +typedef struct { + DECLARE_BITMAP(_bits,NR_VECTORS); +} vmask_t; + struct irq_cfg { int vector; cpumask_t cpu_mask; cpumask_t old_cpu_mask; unsigned move_cleanup_count; + vmask_t *used_vectors; u8 move_in_progress : 1; }; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Jul-26 16:33 UTC
[Xen-devel] [PATCH 2 of 3] xen: Option to allow per-device vector maps for MSI IRQs
Add a vector-map to pci_dev, and add an option to point MSI-related IRQs to the vector-map of the device. This prevents irqs from the same device from being assigned the same vector on different pcpus. This is required for systems using an AMD IOMMU, since the intremap tables on AMD only look at vector, and not destination ID. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> diff -r 4c8f3cae5007 -r 362a881a79a4 xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c Tue Jul 26 17:21:23 2011 +0100 +++ b/xen/arch/x86/irq.c Tue Jul 26 17:21:23 2011 +0100 @@ -32,6 +32,9 @@ unsigned int __read_mostly nr_irqs_gsi unsigned int __read_mostly nr_irqs; integer_param("nr_irqs", nr_irqs); +bool_t __read_mostly opt_irq_perdev_vector_map = 0; +boolean_param("irq-perdev-vector-map", opt_irq_perdev_vector_map); + u8 __read_mostly *irq_vector; struct irq_desc __read_mostly *irq_desc = NULL; @@ -1654,6 +1657,9 @@ int map_domain_pirq( dprintk(XENLOG_G_ERR, "dom%d: irq %d in use\n", d->domain_id, irq); desc->handler = &pci_msi_type; + if ( opt_irq_perdev_vector_map + && !desc->chip_data->used_vectors ) + desc->chip_data->used_vectors = &pdev->info.used_vectors; set_domain_irq_pirq(d, irq, info); setup_msi_irq(pdev, msi_desc, irq); spin_unlock_irqrestore(&desc->lock, flags); diff -r 4c8f3cae5007 -r 362a881a79a4 xen/include/xen/pci.h --- a/xen/include/xen/pci.h Tue Jul 26 17:21:23 2011 +0100 +++ b/xen/include/xen/pci.h Tue Jul 26 17:21:23 2011 +0100 @@ -11,6 +11,7 @@ #include <xen/types.h> #include <xen/list.h> #include <xen/spinlock.h> +#include <xen/irq.h> /* * The PCI interface treats multi-function devices as independent @@ -38,6 +39,7 @@ struct pci_dev_info { u8 bus; u8 devfn; } physfn; + vmask_t used_vectors; }; struct pci_dev { _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Jul-26 16:33 UTC
[Xen-devel] [PATCH 3 of 3] xen: AMD IOMMU: Automatically enable per-device vector maps
Automatically enable per-device vector maps when using IOMMU, unless disabled specifically by an IOMMU parameter. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> diff -r 362a881a79a4 -r fffeeea7afbb xen/arch/x86/irq.c --- a/xen/arch/x86/irq.c Tue Jul 26 17:21:23 2011 +0100 +++ b/xen/arch/x86/irq.c Tue Jul 26 17:21:23 2011 +0100 @@ -32,6 +32,7 @@ unsigned int __read_mostly nr_irqs_gsi unsigned int __read_mostly nr_irqs; integer_param("nr_irqs", nr_irqs); +/* This default may be changed by the AMD IOMMU code */ bool_t __read_mostly opt_irq_perdev_vector_map = 0; boolean_param("irq-perdev-vector-map", opt_irq_perdev_vector_map); diff -r 362a881a79a4 -r fffeeea7afbb xen/drivers/passthrough/amd/pci_amd_iommu.c --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c Tue Jul 26 17:21:23 2011 +0100 +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c Tue Jul 26 17:21:23 2011 +0100 @@ -25,6 +25,9 @@ #include <asm/amd-iommu.h> #include <asm/hvm/svm/amd-iommu-proto.h> +extern bool_t __read_mostly opt_irq_perdev_vector_map; +extern bool_t __read_mostly iommu_amd_perdev_vector_map; + struct amd_iommu *find_iommu_for_device(int bdf) { BUG_ON ( bdf >= ivrs_bdf_entries ); @@ -148,6 +151,18 @@ int __init amd_iov_detect(void) return -ENODEV; } + /* Enable use of per-device vector map unless otherwise + * specified */ + if ( iommu_amd_perdev_vector_map ) + { + printk("AMD-Vi: Enabling per-device vector maps\n"); + opt_irq_perdev_vector_map=1; + } + else + { + printk("AMD-Vi: WARNING - not enabling per-device vector maps\n"); + } + return scan_pci_devices(); } diff -r 362a881a79a4 -r fffeeea7afbb xen/drivers/passthrough/iommu.c --- a/xen/drivers/passthrough/iommu.c Tue Jul 26 17:21:23 2011 +0100 +++ b/xen/drivers/passthrough/iommu.c Tue Jul 26 17:21:23 2011 +0100 @@ -49,6 +49,7 @@ bool_t __read_mostly iommu_qinval = 1; bool_t __read_mostly iommu_intremap = 1; bool_t __read_mostly iommu_hap_pt_share; bool_t __read_mostly iommu_debug; +bool_t __read_mostly iommu_amd_perdev_vector_map = 1; static void __init parse_iommu_param(char *s) { @@ -81,6 +82,8 @@ static void __init parse_iommu_param(cha iommu_dom0_strict = 1; else if ( !strcmp(s, "sharept") ) iommu_hap_pt_share = 1; + else if ( !strcmp(s, "no-perdev-vector-map") ) + iommu_amd_perdev_vector_map = 0; s = ss + 1; } while ( ss ); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Jul-26 17:01 UTC
Re: [Xen-devel] [PATCH 0 of 3] Avoid sharing vectors within a device when using an AMD IOMMU
On 26/07/2011 17:33, "George Dunlap" <george.dunlap@eu.citrix.com> wrote:> The interrupt remapping tables on AMD IOMMUs index by vector only. > This means that if two MSIs go through the table that are destined for > different cpus, but they share the same vector, they will be > redirected to the same place. (E.g., one interrupt on p5 vector 67, > another interrupt on p7 vector 67; both will be redirected to the same > place.) > > Introducing per-device interrupt mappings reduces the problem, but > does not solve it completely if the same device can have multiple IRQs > assigned to it, because you can get the same issue -- two different > IRQs from the same device can be assigned the same vector on different > cpus. This causes one of the IRQs to activated when either interrupt > is triggered, and the other IRQ to never receive any interrupts.The patches look fine, but I don''t see a reason to add yet more command-line options. AMD systems using irq remapping will need to avoid vector sharing; other systems do not need to do so. We detect that automatically and DTRT and there''s no reason for a user to override that. It''s just extending the set of arcane IOMMU command line settings that noone will understand the implications of specifying. -- Keir> This series consists of three patches: > > 1: Introduce infrastructure to allow irqs to share vector maps. Any > IRQs sharing the same vector map will never have vector collisions. > > 2: Introduce option to have per-device vector maps for MSI IRQs. > > 3: Automatically enable per-device vector maps when running on an AMD > system with the IOMMU enabled, unless otherwise specified. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Jul-26 17:17 UTC
Re: [Xen-devel] [PATCH 0 of 3] Avoid sharing vectors within a device when using an AMD IOMMU
On Tue, 2011-07-26 at 18:01 +0100, Keir Fraser wrote:> On 26/07/2011 17:33, "George Dunlap" <george.dunlap@eu.citrix.com> wrote: > > > The interrupt remapping tables on AMD IOMMUs index by vector only. > > This means that if two MSIs go through the table that are destined for > > different cpus, but they share the same vector, they will be > > redirected to the same place. (E.g., one interrupt on p5 vector 67, > > another interrupt on p7 vector 67; both will be redirected to the same > > place.) > > > > Introducing per-device interrupt mappings reduces the problem, but > > does not solve it completely if the same device can have multiple IRQs > > assigned to it, because you can get the same issue -- two different > > IRQs from the same device can be assigned the same vector on different > > cpus. This causes one of the IRQs to activated when either interrupt > > is triggered, and the other IRQ to never receive any interrupts. > > The patches look fine, but I don''t see a reason to add yet more command-line > options. AMD systems using irq remapping will need to avoid vector sharing; > other systems do not need to do so. We detect that automatically and DTRT > and there''s no reason for a user to override that. It''s just extending the > set of arcane IOMMU command line settings that noone will understand the > implications of specifying.I was mainly thinking of the support issue. We don''t do extensive testing of AMD passthrough here at Citrix (or we probably would have tripped over this problem sooner). If there are subtle, unexpected side-effects of this patch, being able to tell people, "Specify iommu=no-perdevice-irq-map" is a lot easier than telling them to install a custom build with the perdevice stuff off. (And it''s something that power users can share with other users without needing to be a developer.) I''m not sure how common it is to have devices register more than one interrupt. If it''s just specialist multi-queue cards, then it may be the case that having per-device intremap tables will suffice in many cases. If nearly all devices have multiple interrupts, then you''re right, the option to disable it for AMDs with IOMMUs disabled is pretty pointless. -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel