Stefano Stabellini
2010-Aug-30 11:20 UTC
[Xen-devel] [PATCH 0/7] PV on HVM: receive interrupts as xen events
Hi all, this patch series introduces some performance improvements for xen PV on HVM guests: interacting with the emulated APIC is slow because it causes traps in the hypervisor while receiving xen events using the vector callback mechanism allow us to skip all that. For this reason we remap interrupts and MSIs into xen pirqs so that from that point on we can receive them as xen events instead. This series is based on Konrad''s pcifront series (not upstream yet): http://lkml.org/lkml/2010/8/4/374 and requires a patch to xen and a patch to qemu-xen (just sent to xen-devel). The list of patches with diffstat follows: Jeremy Fitzhardinge (2): xen: add xen hvm acpi_register_gsi variant acpi: use indirect call to register gsi in different modes Stefano Stabellini (5): xen: xen: map MSIs into pirqs xen: support GSI -> pirq remapping in PV on HVM guests xen: implement xen_hvm_register_pirq xen: get the maximum number of pirqs from xen xen: support pirq != irq arch/x86/include/asm/acpi.h | 3 + arch/x86/include/asm/xen/pci.h | 10 +++ arch/x86/kernel/acpi/boot.c | 60 ++++++++++++++------ arch/x86/pci/xen.c | 114 ++++++++++++++++++++++++++++++++++++++ drivers/pci/xen-pcifront.c | 2 +- drivers/xen/events.c | 106 +++++++++++++++++++++++++++++++---- include/xen/events.h | 3 + include/xen/interface/features.h | 3 + include/xen/interface/physdev.h | 36 ++++++++++++ 9 files changed, 308 insertions(+), 29 deletions(-) A git tree with this series and Konrad''s pcifront series on top of Linux 2.6.36-rc1 is available here: git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.36-rc1-pvhvm-pirq-v3 Cheers, Stefano Stabellini _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2010-Aug-30 11:21 UTC
[Xen-devel] [PATCH 1/7] xen: support pirq != irq
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com> PHYSDEVOP_map_pirq might return a pirq different from what we asked if we are running as an HVM guest, so we need to be able to support pirqs that are different from linux irqs. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/xen/events.c | 49 +++++++++++++++++++++++++++++++++++++++++-------- include/xen/events.h | 1 + 2 files changed, 42 insertions(+), 8 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 395fd19..a5c30db 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -90,6 +90,7 @@ struct irq_info unsigned short virq; enum ipi_vector ipi; struct { + unsigned short pirq; unsigned short gsi; unsigned char vector; unsigned char flags; @@ -100,6 +101,7 @@ struct irq_info #define PIRQ_SHAREABLE (1 << 1) static struct irq_info *irq_info; +static int *pirq_to_irq; static int *evtchn_to_irq; struct cpu_evtchn_s { @@ -146,11 +148,12 @@ static struct irq_info mk_virq_info(unsigned short evtchn, unsigned short virq) .cpu = 0, .u.virq = virq }; } -static struct irq_info mk_pirq_info(unsigned short evtchn, +static struct irq_info mk_pirq_info(unsigned short evtchn, unsigned short pirq, unsigned short gsi, unsigned short vector) { return (struct irq_info) { .type = IRQT_PIRQ, .evtchn = evtchn, - .cpu = 0, .u.pirq = { .gsi = gsi, .vector = vector } }; + .cpu = 0, + .u.pirq = { .pirq = pirq, .gsi = gsi, .vector = vector } }; } /* @@ -192,6 +195,16 @@ static unsigned virq_from_irq(unsigned irq) return info->u.virq; } +static unsigned pirq_from_irq(unsigned irq) +{ + struct irq_info *info = info_for_irq(irq); + + BUG_ON(info == NULL); + BUG_ON(info->type != IRQT_PIRQ); + + return info->u.pirq.pirq; +} + static unsigned gsi_from_irq(unsigned irq) { struct irq_info *info = info_for_irq(irq); @@ -364,6 +377,16 @@ static int get_nr_hw_irqs(void) return ret; } +static int find_unbound_pirq(void) +{ + int i; + for (i = 0; i < nr_irqs; i++) { + if (pirq_to_irq[i] < 0) + return i; + } + return -1; +} + static int find_unbound_irq(void) { int irq; @@ -410,7 +433,7 @@ static bool identity_mapped_irq(unsigned irq) static void pirq_unmask_notify(int irq) { - struct physdev_eoi eoi = { .irq = irq }; + struct physdev_eoi eoi = { .irq = pirq_from_irq(irq) }; if (unlikely(pirq_needs_eoi(irq))) { int rc = HYPERVISOR_physdev_op(PHYSDEVOP_eoi, &eoi); @@ -425,7 +448,7 @@ static void pirq_query_unmask(int irq) BUG_ON(info->type != IRQT_PIRQ); - irq_status.irq = irq; + irq_status.irq = pirq_from_irq(irq); if (HYPERVISOR_physdev_op(PHYSDEVOP_irq_status_query, &irq_status)) irq_status.flags = 0; @@ -453,7 +476,7 @@ static unsigned int startup_pirq(unsigned int irq) if (VALID_EVTCHN(evtchn)) goto out; - bind_pirq.pirq = irq; + bind_pirq.pirq = pirq_from_irq(irq); /* NB. We are happy to share unless we are probing. */ bind_pirq.flags = info->u.pirq.flags & PIRQ_SHAREABLE ? BIND_PIRQ__WILL_SHARE : 0; @@ -556,12 +579,17 @@ static int find_irq_by_gsi(unsigned gsi) return -1; } +int xen_allocate_pirq(unsigned gsi, int shareable, char *name) +{ + return xen_map_pirq_gsi(gsi, gsi, shareable, name); +} + /* * Allocate a physical irq, along with a vector. We don''t assign an * event channel until the irq actually started up. Return an * existing irq if we''ve already got one for the gsi. */ -int xen_allocate_pirq(unsigned gsi, int shareable, char *name) +int xen_map_pirq_gsi(unsigned pirq, unsigned gsi, int shareable, char *name) { int irq; struct physdev_irq irq_op; @@ -570,7 +598,7 @@ int xen_allocate_pirq(unsigned gsi, int shareable, char *name) irq = find_irq_by_gsi(gsi); if (irq != -1) { - printk(KERN_INFO "xen_allocate_pirq: returning irq %d for gsi %u\n", + printk(KERN_INFO "xen_map_pirq_gsi: returning irq %d for gsi %u\n", irq, gsi); goto out; /* XXX need refcount? */ } @@ -600,8 +628,9 @@ int xen_allocate_pirq(unsigned gsi, int shareable, char *name) goto out; } - irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + irq_info[irq] = mk_pirq_info(0, pirq, gsi, irq_op.vector); irq_info[irq].u.pirq.flags |= shareable ? PIRQ_SHAREABLE : 0; + pirq_to_irq[pirq] = irq; out: spin_unlock(&irq_mapping_update_lock); @@ -1311,6 +1340,10 @@ void __init xen_init_IRQ(void) GFP_KERNEL); irq_info = kcalloc(nr_irqs, sizeof(*irq_info), GFP_KERNEL); + pirq_to_irq = kcalloc(nr_irqs, sizeof(*pirq_to_irq), GFP_KERNEL); + for (i = 0; i < nr_irqs; i++) + pirq_to_irq[i] = -1; + evtchn_to_irq = kcalloc(NR_EVENT_CHANNELS, sizeof(*evtchn_to_irq), GFP_KERNEL); for (i = 0; i < NR_EVENT_CHANNELS; i++) diff --git a/include/xen/events.h b/include/xen/events.h index 2f70de4..b276c38 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -71,6 +71,7 @@ void xen_hvm_evtchn_do_upcall(void); GSIs are identity mapped; others are dynamically allocated as usual. */ int xen_allocate_pirq(unsigned gsi, int shareable, char *name); +int xen_map_pirq_gsi(unsigned pirq, unsigned gsi, int shareable, char *name); /* De-allocates the above mentioned physical interrupt. */ int xen_destroy_irq(int irq); -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2010-Aug-30 11:21 UTC
[Xen-devel] [PATCH 2/7] xen: get the maximum number of pirqs from xen
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Use PHYSDEVOP_get_nr_pirqs to get the maximum number of pirqs from xen. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- drivers/xen/events.c | 31 +++++++++++++++++++++++++++---- include/xen/interface/physdev.h | 6 ++++++ 2 files changed, 33 insertions(+), 4 deletions(-) diff --git a/drivers/xen/events.c b/drivers/xen/events.c index a5c30db..aca7230 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -102,6 +102,7 @@ struct irq_info static struct irq_info *irq_info; static int *pirq_to_irq; +static int nr_pirqs; static int *evtchn_to_irq; struct cpu_evtchn_s { @@ -377,10 +378,12 @@ static int get_nr_hw_irqs(void) return ret; } +/* callers of this function should make sure that PHYSDEVOP_get_nr_pirqs + * succeeded otherwise nr_pirqs won''t hold the right value */ static int find_unbound_pirq(void) { int i; - for (i = 0; i < nr_irqs; i++) { + for (i = nr_pirqs-1; i >= 0; i--) { if (pirq_to_irq[i] < 0) return i; } @@ -596,6 +599,14 @@ int xen_map_pirq_gsi(unsigned pirq, unsigned gsi, int shareable, char *name) spin_lock(&irq_mapping_update_lock); + if ((pirq > nr_pirqs) || (gsi > nr_irqs)) { + printk(KERN_WARN "%s: %s %s is incorrect!\n", + __function__, + pirq > nr_pirqs : "nr_pirqs" :"", + gsi > nr_irqs : "nr_irqs" : ""); + goto out; + } + irq = find_irq_by_gsi(gsi); if (irq != -1) { printk(KERN_INFO "xen_map_pirq_gsi: returning irq %d for gsi %u\n", @@ -1334,14 +1345,26 @@ void xen_callback_vector(void) {} void __init xen_init_IRQ(void) { - int i; + int i, rc; + struct physdev_nr_pirqs op_nr_pirqs; cpu_evtchn_mask_p = kcalloc(nr_cpu_ids, sizeof(struct cpu_evtchn_s), GFP_KERNEL); irq_info = kcalloc(nr_irqs, sizeof(*irq_info), GFP_KERNEL); - pirq_to_irq = kcalloc(nr_irqs, sizeof(*pirq_to_irq), GFP_KERNEL); - for (i = 0; i < nr_irqs; i++) + rc = HYPERVISOR_physdev_op(PHYSDEVOP_get_nr_pirqs, &op_nr_pirqs); + if (rc < 0) { + nr_pirqs = nr_irqs; + if (rc != -ENOSYS) + printk(KERN_WARNING "PHYSDEVOP_get_nr_pirqs returned rc=%d\n", rc); + } else { + if (xen_pv_domain() && !xen_intial_domain()) + nr_pirqs = max(op_nr_pirqs.nr_pirqs, nr_irqs); + else + nr_pirqs = op_nr_pirqs.nr_pirqs; + } + pirq_to_irq = kcalloc(nr_pirqs, sizeof(*pirq_to_irq), GFP_KERNEL); + for (i = 0; i < nr_pirqs; i++) pirq_to_irq[i] = -1; evtchn_to_irq = kcalloc(NR_EVENT_CHANNELS, sizeof(*evtchn_to_irq), diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h index cd69391..fbb5883 100644 --- a/include/xen/interface/physdev.h +++ b/include/xen/interface/physdev.h @@ -121,6 +121,12 @@ struct physdev_op { } u; }; +#define PHYSDEVOP_get_nr_pirqs 22 +struct physdev_nr_pirqs { + /* OUT */ + uint32_t nr_pirqs; +}; + /* * Notify that some PIRQ-bound event channels have been unmasked. * ** This command is obsolete since interface version 0x00030202 and is ** -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2010-Aug-30 11:21 UTC
[Xen-devel] [PATCH 3/7] xen: implement xen_hvm_register_pirq
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com> xen_hvm_register_pirq allows the kernel to map a GSI into a Xen pirq and receive the interrupt as an event channel from that point on. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/include/asm/xen/pci.h | 5 +++++ arch/x86/pci/xen.c | 36 ++++++++++++++++++++++++++++++++++++ drivers/xen/events.c | 4 +++- include/xen/interface/physdev.h | 30 ++++++++++++++++++++++++++++++ 4 files changed, 74 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h index 449c82f..b4d908b 100644 --- a/arch/x86/include/asm/xen/pci.h +++ b/arch/x86/include/asm/xen/pci.h @@ -3,10 +3,15 @@ #if defined(CONFIG_PCI_XEN) extern int __init pci_xen_init(void); +int xen_hvm_register_pirq(u32 gsi, int triggering); #define pci_xen 1 #else #define pci_xen 0 #define pci_xen_init (0) +static inline int xen_hvm_register_pirq(u32 gsi, int triggering) +{ + return -1; +} #endif #if defined(CONFIG_PCI_MSI) diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index b19c873..d6fd58f 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -17,6 +17,42 @@ #include <xen/events.h> #include <asm/xen/pci.h> +int xen_hvm_register_pirq(u32 gsi, int triggering) +{ + int rc, irq; + struct physdev_map_pirq map_irq; + int shareable = 0; + char *name; + + if (!xen_hvm_domain()) + return -1; + + map_irq.domid = DOMID_SELF; + map_irq.type = MAP_PIRQ_TYPE_GSI; + map_irq.index = gsi; + map_irq.pirq = -1; + + rc = HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq); + if (rc) { + printk(KERN_WARNING "xen map irq failed %d\n", rc); + return -1; + } + + if (triggering == ACPI_EDGE_SENSITIVE) { + shareable = 0; + name = "ioapic-edge"; + } else { + shareable = 1; + name = "ioapic-level"; + } + + irq = xen_map_pirq_gsi(map_irq.pirq, gsi, shareable, name); + + printk(KERN_DEBUG "xen: --> irq=%d, pirq=%d\n", irq, map_irq.pirq); + + return irq; +} + #if defined(CONFIG_PCI_MSI) #include <linux/msi.h> diff --git a/drivers/xen/events.c b/drivers/xen/events.c index aca7230..302dad1 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -36,6 +36,7 @@ #include <asm/idle.h> #include <asm/io_apic.h> #include <asm/sync_bitops.h> +#include <asm/xen/pci.h> #include <asm/xen/hypercall.h> #include <asm/xen/hypervisor.h> @@ -75,7 +76,8 @@ enum xen_irq_type { * event channel - irq->event channel mapping * cpu - cpu this event channel is bound to * index - type-specific information: - * PIRQ - vector, with MSB being "needs EIO" + * PIRQ - vector, with MSB being "needs EIO", or physical IRQ of the HVM + * guest, or GSI (real passthrough IRQ) of the device. * VIRQ - virq number * IPI - IPI vector * EVTCHN - diff --git a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h index fbb5883..69a72b9 100644 --- a/include/xen/interface/physdev.h +++ b/include/xen/interface/physdev.h @@ -106,6 +106,36 @@ struct physdev_irq { uint32_t vector; }; +#define MAP_PIRQ_TYPE_MSI 0x0 +#define MAP_PIRQ_TYPE_GSI 0x1 +#define MAP_PIRQ_TYPE_UNKNOWN 0x2 + +#define PHYSDEVOP_map_pirq 13 +struct physdev_map_pirq { + domid_t domid; + /* IN */ + int type; + /* IN */ + int index; + /* IN or OUT */ + int pirq; + /* IN */ + int bus; + /* IN */ + int devfn; + /* IN */ + int entry_nr; + /* IN */ + uint64_t table_base; +}; + +#define PHYSDEVOP_unmap_pirq 14 +struct physdev_unmap_pirq { + domid_t domid; + /* IN */ + int pirq; +}; + /* * Argument to physdev_op_compat() hypercall. Superceded by new physdev_op() * hypercall since 0x00030202. -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2010-Aug-30 11:21 UTC
[Xen-devel] [PATCH 4/7] acpi: use indirect call to register gsi in different modes
From: Jeremy Fitzhardinge <jeremy@goop.org> Rather than using a tree of conditionals, use function pointer for acpi_register_gsi. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/kernel/acpi/boot.c | 59 ++++++++++++++++++++++++++++++------------ 1 files changed, 42 insertions(+), 17 deletions(-) diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index c05872a..031f0c2 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -513,35 +513,61 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi) return 0; } -/* - * success: return IRQ number (>=0) - * failure: return < 0 - */ -int acpi_register_gsi(struct device *dev, u32 gsi, int trigger, int polarity) +static int acpi_register_gsi_pic(struct device *dev, u32 gsi, + int trigger, int polarity) { - unsigned int irq; - unsigned int plat_gsi = gsi; - #ifdef CONFIG_PCI /* * Make sure all (legacy) PCI IRQs are set as level-triggered. */ - if (acpi_irq_model == ACPI_IRQ_MODEL_PIC) { - if (trigger == ACPI_LEVEL_SENSITIVE) - eisa_set_level_irq(gsi); - } + if (trigger == ACPI_LEVEL_SENSITIVE) + eisa_set_level_irq(gsi); #endif + return gsi; +} + +static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi, + int trigger, int polarity) +{ #ifdef CONFIG_X86_IO_APIC - if (acpi_irq_model == ACPI_IRQ_MODEL_IOAPIC) { - plat_gsi = mp_register_gsi(dev, gsi, trigger, polarity); - } + gsi = mp_register_gsi(dev, gsi, trigger, polarity); #endif + + return gsi; +} + +static int (*__acpi_register_gsi)(struct device *dev, u32 gsi, int trigger, int polarity) = acpi_register_gsi_pic; + +/* + * success: return IRQ number (>=0) + * failure: return < 0 + */ +int acpi_register_gsi(struct device *dev, u32 gsi, int trigger, int polarity) +{ + unsigned int irq; + unsigned int plat_gsi = gsi; + + plat_gsi = (*__acpi_register_gsi)(dev, gsi, trigger, polarity); irq = gsi_to_irq(plat_gsi); return irq; } +void __init acpi_set_irq_model_pic(void) +{ + acpi_irq_model = ACPI_IRQ_MODEL_PIC; + __acpi_register_gsi = acpi_register_gsi_pic; + acpi_ioapic = 0; +} + +void __init acpi_set_irq_model_ioapic(void) +{ + acpi_irq_model = ACPI_IRQ_MODEL_IOAPIC; + __acpi_register_gsi = acpi_register_gsi_ioapic; + acpi_ioapic = 1; +} + /* * ACPI based hotplug support for CPU */ @@ -1259,8 +1285,7 @@ static void __init acpi_process_madt(void) */ error = acpi_parse_madt_ioapic_entries(); if (!error) { - acpi_irq_model = ACPI_IRQ_MODEL_IOAPIC; - acpi_ioapic = 1; + acpi_set_irq_model_ioapic(); smp_found_config = 1; } -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2010-Aug-30 11:21 UTC
[Xen-devel] [PATCH 5/7] xen: add xen hvm acpi_register_gsi variant
From: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/include/asm/acpi.h | 3 +++ arch/x86/kernel/acpi/boot.c | 3 ++- arch/x86/pci/xen.c | 7 +++++++ 3 files changed, 12 insertions(+), 1 deletions(-) diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h index 92091de..55d106b 100644 --- a/arch/x86/include/asm/acpi.h +++ b/arch/x86/include/asm/acpi.h @@ -93,6 +93,9 @@ extern u8 acpi_sci_flags; extern int acpi_sci_override_gsi; void acpi_pic_sci_set_trigger(unsigned int, u16); +extern int (*__acpi_register_gsi)(struct device *dev, u32 gsi, + int trigger, int polarity); + static inline void disable_acpi(void) { acpi_disabled = 1; diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c index 031f0c2..71232b9 100644 --- a/arch/x86/kernel/acpi/boot.c +++ b/arch/x86/kernel/acpi/boot.c @@ -537,7 +537,8 @@ static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi, return gsi; } -static int (*__acpi_register_gsi)(struct device *dev, u32 gsi, int trigger, int polarity) = acpi_register_gsi_pic; +int (*__acpi_register_gsi)(struct device *dev, u32 gsi, + int trigger, int polarity) = acpi_register_gsi_pic; /* * success: return IRQ number (>=0) diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index d6fd58f..0ed1cae 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -152,6 +152,13 @@ static int xen_pcifront_enable_irq(struct pci_dev *dev) return 0; } +static int acpi_register_gsi_xen_hvm(struct device *dev, u32 gsi, + int trigger, int polarity) +{ + return xen_hvm_register_pirq(gsi, trigger); +} + + int __init pci_xen_init(void) { if (!xen_pv_domain() || xen_initial_domain()) -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2010-Aug-30 11:21 UTC
[Xen-devel] [PATCH 6/7] xen: support GSI -> pirq remapping in PV on HVM guests
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Disable pcifront when running on HVM: it is meant to be used with pv guests that don''t have PCI bus. Use acpi_register_gsi_xen_hvm to remap GSIs into pirqs. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/include/asm/xen/pci.h | 5 +++++ arch/x86/pci/xen.c | 14 ++++++++++++++ drivers/pci/xen-pcifront.c | 2 +- drivers/xen/events.c | 6 +++++- include/xen/interface/features.h | 3 +++ 5 files changed, 28 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h index b4d908b..be3dc21 100644 --- a/arch/x86/include/asm/xen/pci.h +++ b/arch/x86/include/asm/xen/pci.h @@ -4,10 +4,15 @@ #if defined(CONFIG_PCI_XEN) extern int __init pci_xen_init(void); int xen_hvm_register_pirq(u32 gsi, int triggering); +extern int __init pci_xen_hvm_init(void); #define pci_xen 1 #else #define pci_xen 0 #define pci_xen_init (0) +static inline int pci_xen_hvm_init(void) +{ + return -1; +} static inline int xen_hvm_register_pirq(u32 gsi, int triggering) { return -1; diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index 0ed1cae..bfbe185 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -14,6 +14,7 @@ #include <asm/xen/hypervisor.h> +#include <xen/features.h> #include <xen/events.h> #include <asm/xen/pci.h> @@ -188,3 +189,16 @@ int __init pci_xen_init(void) #endif return 0; } + +int __init pci_xen_hvm_init(void) +{ + if (!xen_feature(XENFEAT_hvm_pirqs)) + return 0; + + /* + * We don''t want to change the actual ACPI delivery model, + * just how GSIs get registered. + */ + __acpi_register_gsi = acpi_register_gsi_xen_hvm; + return 0; +} diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c index a48a733..0a5d673 100644 --- a/drivers/pci/xen-pcifront.c +++ b/drivers/pci/xen-pcifront.c @@ -1136,7 +1136,7 @@ static struct xenbus_driver xenbus_pcifront_driver = { static int __init pcifront_init(void) { - if (!xen_domain()) + if (!xen_domain() || xen_hvm_domain()) return -ENODEV; pci_frontend_registrar(1 /* enable */); diff --git a/drivers/xen/events.c b/drivers/xen/events.c index 302dad1..ab4e393 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -618,7 +618,8 @@ int xen_map_pirq_gsi(unsigned pirq, unsigned gsi, int shareable, char *name) /* If we are a PV guest, we don''t have GSIs (no ACPI passed). Therefore * we are using the !xen_initial_domain() to drop in the function.*/ - if (identity_mapped_irq(gsi) || !xen_initial_domain()) { + if (identity_mapped_irq(gsi) || (!xen_initial_domain() && + xen_pv_domain())) { irq = gsi; irq_to_desc_alloc_node(irq, 0); dynamic_irq_init(irq); @@ -1383,6 +1384,9 @@ void __init xen_init_IRQ(void) if (xen_hvm_domain()) { xen_callback_vector(); native_init_IRQ(); + /* pci_xen_hvm_init must be called after native_init_IRQ so that + * __acpi_register_gsi can point at the right function */ + pci_xen_hvm_init(); } else { irq_ctx_init(smp_processor_id()); } diff --git a/include/xen/interface/features.h b/include/xen/interface/features.h index 70d2563..b6ca39a 100644 --- a/include/xen/interface/features.h +++ b/include/xen/interface/features.h @@ -47,6 +47,9 @@ /* x86: pvclock algorithm is safe to use on HVM */ #define XENFEAT_hvm_safe_pvclock 9 +/* x86: pirq can be used by HVM guests */ +#define XENFEAT_hvm_pirqs 10 + #define XENFEAT_NR_SUBMAPS 1 #endif /* __XEN_PUBLIC_FEATURES_H__ */ -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
stefano.stabellini@eu.citrix.com
2010-Aug-30 11:21 UTC
[Xen-devel] [PATCH 7/7] xen: map MSIs into pirqs
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Map MSIs into pirqs, writing 0 in the MSI vector data field and the pirq number in the MSI destination id field. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> --- arch/x86/pci/xen.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++ drivers/xen/events.c | 22 +++++++++++++++++++ include/xen/events.h | 2 + 3 files changed, 81 insertions(+), 0 deletions(-) diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c index bfbe185..6fbc81a 100644 --- a/arch/x86/pci/xen.c +++ b/arch/x86/pci/xen.c @@ -56,10 +56,62 @@ int xen_hvm_register_pirq(u32 gsi, int triggering) #if defined(CONFIG_PCI_MSI) #include <linux/msi.h> +#include <asm/msidef.h> struct xen_pci_frontend_ops *xen_pci_frontend; EXPORT_SYMBOL_GPL(xen_pci_frontend); +static void xen_msi_compose_msg(struct pci_dev *pdev, unsigned int pirq, + struct msi_msg *msg) +{ + /* We set vector == 0 to tell the hypervisor we don''t care about it, + * but we want a pirq setup instead. + * We use the dest_id field to pass the pirq that we want. */ + msg->address_hi = MSI_ADDR_BASE_HI | MSI_ADDR_EXT_DEST_ID(pirq); + msg->address_lo + MSI_ADDR_BASE_LO | + MSI_ADDR_DEST_MODE_PHYSICAL | + MSI_ADDR_REDIRECTION_CPU | + MSI_ADDR_DEST_ID(pirq); + + msg->data + MSI_DATA_TRIGGER_EDGE | + MSI_DATA_LEVEL_ASSERT | + /* delivery mode reserved */ + (3 << 8) | + MSI_DATA_VECTOR(0); +} + +static int xen_hvm_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) +{ + int irq, pirq, ret = 0; + struct msi_desc *msidesc; + struct msi_msg msg; + + list_for_each_entry(msidesc, &dev->msi_list, list) { + xen_allocate_pirq_msi((type == PCI_CAP_ID_MSIX) ? + "msi-x" : "msi", &irq, &pirq); + if (irq < 0 || pirq < 0) + goto error; + printk(KERN_DEBUG "xen: msi --> irq=%d, pirq=%d\n", irq, pirq); + xen_msi_compose_msg(dev, pirq, &msg); + ret = set_irq_msi(irq, msidesc); + if (ret < 0) + goto error_while; + write_msi_msg(irq, &msg); + } + return 0; + +error_while: + unbind_from_irqhandler(irq, NULL); +error: + if (ret == -ENODEV) + dev_err(&dev->dev, "Xen PCI frontend has not registered" \ + " MSI/MSI-X support!\n"); + + return ret; +} + /* * For MSI interrupts we have to use drivers/xen/event.s functions to * allocate an irq_desc and setup the right */ @@ -200,5 +252,10 @@ int __init pci_xen_hvm_init(void) * just how GSIs get registered. */ __acpi_register_gsi = acpi_register_gsi_xen_hvm; + +#ifdef CONFIG_PCI_MSI + x86_msi.setup_msi_irqs = xen_hvm_setup_msi_irqs; + x86_msi.teardown_msi_irq = xen_teardown_msi_irq; +#endif return 0; } diff --git a/drivers/xen/events.c b/drivers/xen/events.c index ab4e393..9fdf3c7 100644 --- a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -652,6 +652,28 @@ out: return irq; } +void xen_allocate_pirq_msi(char *name, int *irq, int *pirq) +{ + spin_lock(&irq_mapping_update_lock); + + *irq = find_unbound_irq(); + if (*irq == -1) + goto out; + + *pirq = find_unbound_pirq(); + if (*pirq == -1) + goto out; + + set_irq_chip_and_handler_name(*irq, &xen_pirq_chip, + handle_level_irq, name); + + irq_info[*irq] = mk_pirq_info(0, *pirq, 0, 0); + pirq_to_irq[*pirq] = *irq; + +out: + spin_unlock(&irq_mapping_update_lock); +} + int xen_destroy_irq(int irq) { struct irq_desc *desc; diff --git a/include/xen/events.h b/include/xen/events.h index b276c38..5e67be7 100644 --- a/include/xen/events.h +++ b/include/xen/events.h @@ -72,6 +72,8 @@ void xen_hvm_evtchn_do_upcall(void); usual. */ int xen_allocate_pirq(unsigned gsi, int shareable, char *name); int xen_map_pirq_gsi(unsigned pirq, unsigned gsi, int shareable, char *name); +/* Allocate an irq and a pirq to be used with MSIs. */ +void xen_allocate_pirq_msi(char *name, int *irq, int *pirq); /* De-allocates the above mentioned physical interrupt. */ int xen_destroy_irq(int irq); -- 1.5.6.5 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Sep-02 19:04 UTC
[Xen-devel] Re: [PATCH 0/7] PV on HVM: receive interrupts as xen events
On 08/30/2010 04:20 AM, Stefano Stabellini wrote:> Hi all, > this patch series introduces some performance improvements for xen PV on > HVM guests: interacting with the emulated APIC is slow because it causes > traps in the hypervisor while receiving xen events using the vector callback > mechanism allow us to skip all that. For this reason we remap interrupts > and MSIs into xen pirqs so that from that point on we can receive them > as xen events instead. > This series is based on Konrad''s pcifront series (not upstream yet): > > http://lkml.org/lkml/2010/8/4/374 > > and requires a patch to xen and a patch to qemu-xen (just sent to > xen-devel).My only concern with this series is the pirq remapping stuff. Why do pirq and irq need to be non-identical? Is it because pirq is a global namespace, and dom0 has already assigned it? Why do guests need to know about max pirq? Would it be better to make Xen use a more dynamic structure for pirqs so that any arbitrary value can be used? J> > The list of patches with diffstat follows: > > Jeremy Fitzhardinge (2): > xen: add xen hvm acpi_register_gsi variant > acpi: use indirect call to register gsi in different modes > > Stefano Stabellini (5): > xen: xen: map MSIs into pirqs > xen: support GSI -> pirq remapping in PV on HVM guests > xen: implement xen_hvm_register_pirq > xen: get the maximum number of pirqs from xen > xen: support pirq != irq > > > arch/x86/include/asm/acpi.h | 3 + > arch/x86/include/asm/xen/pci.h | 10 +++ > arch/x86/kernel/acpi/boot.c | 60 ++++++++++++++------ > arch/x86/pci/xen.c | 114 ++++++++++++++++++++++++++++++++++++++ > drivers/pci/xen-pcifront.c | 2 +- > drivers/xen/events.c | 106 +++++++++++++++++++++++++++++++---- > include/xen/events.h | 3 + > include/xen/interface/features.h | 3 + > include/xen/interface/physdev.h | 36 ++++++++++++ > 9 files changed, 308 insertions(+), 29 deletions(-) > > > A git tree with this series and Konrad''s pcifront series on top of Linux > 2.6.36-rc1 is available here: > > git://xenbits.xen.org/people/sstabellini/linux-pvhvm.git 2.6.36-rc1-pvhvm-pirq-v3 > > Cheers, > > Stefano Stabellini >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2010-Sep-03 13:51 UTC
[Xen-devel] Re: [PATCH 0/7] PV on HVM: receive interrupts as xen events
On Thu, 2 Sep 2010, Jeremy Fitzhardinge wrote:> On 08/30/2010 04:20 AM, Stefano Stabellini wrote: > > Hi all, > > this patch series introduces some performance improvements for xen PV on > > HVM guests: interacting with the emulated APIC is slow because it causes > > traps in the hypervisor while receiving xen events using the vector callback > > mechanism allow us to skip all that. For this reason we remap interrupts > > and MSIs into xen pirqs so that from that point on we can receive them > > as xen events instead. > > This series is based on Konrad''s pcifront series (not upstream yet): > > > > http://lkml.org/lkml/2010/8/4/374 > > > > and requires a patch to xen and a patch to qemu-xen (just sent to > > xen-devel). > > My only concern with this series is the pirq remapping stuff. Why do > pirq and irq need to be non-identical? Is it because pirq is a global > namespace, and dom0 has already assigned it? > > Why do guests need to know about max pirq? Would it be better to make > Xen use a more dynamic structure for pirqs so that any arbitrary value > can be used? >No, pirq is a per-domain namespace, but pirq and irq are conceptually different: pirqs are used by xen as a reference for interrupts of devices assigned to the guest, while linux uses irqs for its internal purposes. The pirq namespace is chosen by xen while the linux irq namespace is chosen by linux. Linux is allowed to choose the pirq number it wants when mapping an interrupt, this is why linux needs to know the max pirq, so that it can safely chose a pirq that is in the allowed range. The difference between pirqs and linux irqs increases when we talk about PV on HVM guests: in this case qemu also maps interrupts in the guests getting pirqs in return, so the linux kernel has to be able to cope with already assigned pirq numbers. The current PHYSDEVOP_map_pirq interface is already flexible enough for that because it provides the possibility for the caller to let xen choose the pirq, something that linux never does in the pure PV case, but it is still possible. Obviously if you let xen choose the pirq number you are safe from conflicts but you must be able to cope with pirq numbers different from linux irq numbers. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2010-Oct-06 17:00 UTC
[Xen-devel] Re: [PATCH 4/7] acpi: use indirect call to register gsi in different modes
Peter, I sent this patch a while back as part of the "PV on HVM: receive interrupts as xen events" series (that it is based upon Konrad''s pcifront series, he sent another version to the list yesterday). Do you think that this is a reasonble approach? If you want to give a look at the whole series you can find it here: http://lkml.org/lkml/2010/8/30/170 everything else is Xen specific stuff. Thanks in advance, Stefano On Mon, 30 Aug 2010, stefano.stabellini@eu.citrix.com wrote:> From: Jeremy Fitzhardinge <jeremy@goop.org> > > Rather than using a tree of conditionals, use function pointer > for acpi_register_gsi. > > Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> > Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > --- > arch/x86/kernel/acpi/boot.c | 59 ++++++++++++++++++++++++++++++------------ > 1 files changed, 42 insertions(+), 17 deletions(-) > > diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c > index c05872a..031f0c2 100644 > --- a/arch/x86/kernel/acpi/boot.c > +++ b/arch/x86/kernel/acpi/boot.c > @@ -513,35 +513,61 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi) > return 0; > } > > -/* > - * success: return IRQ number (>=0) > - * failure: return < 0 > - */ > -int acpi_register_gsi(struct device *dev, u32 gsi, int trigger, int polarity) > +static int acpi_register_gsi_pic(struct device *dev, u32 gsi, > + int trigger, int polarity) > { > - unsigned int irq; > - unsigned int plat_gsi = gsi; > - > #ifdef CONFIG_PCI > /* > * Make sure all (legacy) PCI IRQs are set as level-triggered. > */ > - if (acpi_irq_model == ACPI_IRQ_MODEL_PIC) { > - if (trigger == ACPI_LEVEL_SENSITIVE) > - eisa_set_level_irq(gsi); > - } > + if (trigger == ACPI_LEVEL_SENSITIVE) > + eisa_set_level_irq(gsi); > #endif > > + return gsi; > +} > + > +static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi, > + int trigger, int polarity) > +{ > #ifdef CONFIG_X86_IO_APIC > - if (acpi_irq_model == ACPI_IRQ_MODEL_IOAPIC) { > - plat_gsi = mp_register_gsi(dev, gsi, trigger, polarity); > - } > + gsi = mp_register_gsi(dev, gsi, trigger, polarity); > #endif > + > + return gsi; > +} > + > +static int (*__acpi_register_gsi)(struct device *dev, u32 gsi, int trigger, int polarity) = acpi_register_gsi_pic; > + > +/* > + * success: return IRQ number (>=0) > + * failure: return < 0 > + */ > +int acpi_register_gsi(struct device *dev, u32 gsi, int trigger, int polarity) > +{ > + unsigned int irq; > + unsigned int plat_gsi = gsi; > + > + plat_gsi = (*__acpi_register_gsi)(dev, gsi, trigger, polarity); > irq = gsi_to_irq(plat_gsi); > > return irq; > } > > +void __init acpi_set_irq_model_pic(void) > +{ > + acpi_irq_model = ACPI_IRQ_MODEL_PIC; > + __acpi_register_gsi = acpi_register_gsi_pic; > + acpi_ioapic = 0; > +} > + > +void __init acpi_set_irq_model_ioapic(void) > +{ > + acpi_irq_model = ACPI_IRQ_MODEL_IOAPIC; > + __acpi_register_gsi = acpi_register_gsi_ioapic; > + acpi_ioapic = 1; > +} > + > /* > * ACPI based hotplug support for CPU > */ > @@ -1259,8 +1285,7 @@ static void __init acpi_process_madt(void) > */ > error = acpi_parse_madt_ioapic_entries(); > if (!error) { > - acpi_irq_model = ACPI_IRQ_MODEL_IOAPIC; > - acpi_ioapic = 1; > + acpi_set_irq_model_ioapic(); > > smp_found_config = 1; > } > -- > 1.5.6.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
H. Peter Anvin
2010-Oct-06 17:31 UTC
[Xen-devel] Re: [PATCH 4/7] acpi: use indirect call to register gsi in different modes
On 10/06/2010 10:00 AM, Stefano Stabellini wrote:> Peter, > I sent this patch a while back as part of the "PV on HVM: receive > interrupts as xen events" series (that it is based upon Konrad''s > pcifront series, he sent another version to the list yesterday). > > Do you think that this is a reasonble approach? > > If you want to give a look at the whole series you can find it here: > > http://lkml.org/lkml/2010/8/30/170 > > everything else is Xen specific stuff. > > Thanks in advance, >On the surface it seems reasonable. Thomas Gleixner has been working on greatly revamping a bunch of the x86 interrupt code, and this might interact with his stuff, so it would be good if he could comment on it. [tglx: what they''re apparently working on is to redirect APIC interrupts to Xen event channels so they don''t have to interact with the simulated memory-mapped APIC. It''s "paravirtualization light".] -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don''t speak on their behalf. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thomas Gleixner
2010-Oct-06 18:48 UTC
[Xen-devel] Re: [PATCH 4/7] acpi: use indirect call to register gsi in different modes
On Wed, 6 Oct 2010, H. Peter Anvin wrote:> On 10/06/2010 10:00 AM, Stefano Stabellini wrote: > > Peter, > > I sent this patch a while back as part of the "PV on HVM: receive > > interrupts as xen events" series (that it is based upon Konrad''s > > pcifront series, he sent another version to the list yesterday). > > > > Do you think that this is a reasonble approach? > > > > If you want to give a look at the whole series you can find it here: > > > > http://lkml.org/lkml/2010/8/30/170 > > > > everything else is Xen specific stuff. > > > > Thanks in advance, > > > > On the surface it seems reasonable. > > Thomas Gleixner has been working on greatly revamping a bunch of the x86 > interrupt code, and this might interact with his stuff, so it would be > good if he could comment on it. > > [tglx: what they''re apparently working on is to redirect APIC interrupts > to Xen event channels so they don''t have to interact with the simulated > memory-mapped APIC. It''s "paravirtualization light".]I''m not touching acpi code. So that should be ok. Thanks, tglx _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2010-Oct-07 10:41 UTC
[Xen-devel] Re: [PATCH 4/7] acpi: use indirect call to register gsi in different modes
On Wed, 6 Oct 2010, Thomas Gleixner wrote:> > > On Wed, 6 Oct 2010, H. Peter Anvin wrote: > > > On 10/06/2010 10:00 AM, Stefano Stabellini wrote: > > > Peter, > > > I sent this patch a while back as part of the "PV on HVM: receive > > > interrupts as xen events" series (that it is based upon Konrad''s > > > pcifront series, he sent another version to the list yesterday). > > > > > > Do you think that this is a reasonble approach? > > > > > > If you want to give a look at the whole series you can find it here: > > > > > > http://lkml.org/lkml/2010/8/30/170 > > > > > > everything else is Xen specific stuff. > > > > > > Thanks in advance, > > > > > > > On the surface it seems reasonable. > > > > Thomas Gleixner has been working on greatly revamping a bunch of the x86 > > interrupt code, and this might interact with his stuff, so it would be > > good if he could comment on it. > > > > [tglx: what they''re apparently working on is to redirect APIC interrupts > > to Xen event channels so they don''t have to interact with the simulated > > memory-mapped APIC. It''s "paravirtualization light".] > > I''m not touching acpi code. So that should be ok.In that case would be OK for me to add an acked-by H. Peter Anvin to this patch? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel