Jeremy, attached is basic MSI support to PV_dom0. Please have a look on it. Thanks Yunhong Jiang The pci_wrapper.patch add some hook to pci_bus_type, so that Xen will be notified when a PCI device is added to system. Makefile | 2 - pci_wrap.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+), 1 deletion(-) The enable_msi.patch add the MSI support to dom0, basically it just allocate irq from Xen irq name space. arch/x86/include/asm/xen/pci.h | 17 +++++++ arch/x86/kernel/apic/io_apic.c | 22 +++++++-- arch/x86/xen/apic.c | 1 drivers/xen/events.c | 91 +++++++++++++++++++++++++++++++++++++++- include/xen/interface/physdev.h | 59 +++++++++++++++++++++++++ 5 files changed, 182 insertions(+), 8 deletions(-) The reenable_msi.patch just remove original function that disable MSI in xen environment. arch/x86/xen/apic.c | 2 -- drivers/pci/pci.h | 2 ++ include/linux/pci.h | 6 ------ 3 files changed, 2 insertions(+), 8 deletions(-) Notice: a) Currently the PCI save/restore is broken. The reason is because pci_restore_msi_state() in "drivers/pci/msi.c" will try to restore the MSI config depends on device''s msi msg information (i.e. content of msi_desc->msg). However, that information is wrong in Xen environment because Xen HV owns MSI. I''m still trying to find a method to achieve the save/restore without touch the common msi.c code, any suggestion is welcome. b) I notice pci frontend is in a branch. But to support pci frontend without touch common PCI function is difficult, I''m still considering it. c) I''m not sure if xen''s irq space should be same as pirq. Basically I think irq is dom0 internal structure, while PIRQ is interface between Xen HV/dom0. But seems current implementation think these two items are same. I didn''t try to change it, but that may need improvement. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-12 20:05 UTC
Re: [Xen-devel] [PATCH]Add MSI support to PV_dom0
Jiang, Yunhong wrote:> Jeremy, attached is basic MSI support to PV_dom0. Please have a look on it. > > Thanks > Yunhong Jiang > > The pci_wrapper.patch add some hook to pci_bus_type, so that Xen will be notified when a PCI device is added to system. > Makefile | 2 - > pci_wrap.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 91 insertions(+), 1 deletion(-) > > The enable_msi.patch add the MSI support to dom0, basically it just allocate irq from Xen irq name space. > > arch/x86/include/asm/xen/pci.h | 17 +++++++ > arch/x86/kernel/apic/io_apic.c | 22 +++++++-- > arch/x86/xen/apic.c | 1 > drivers/xen/events.c | 91 +++++++++++++++++++++++++++++++++++++++- > include/xen/interface/physdev.h | 59 +++++++++++++++++++++++++ > 5 files changed, 182 insertions(+), 8 deletions(-) > > The reenable_msi.patch just remove original function that disable MSI in xen environment. > > arch/x86/xen/apic.c | 2 -- > drivers/pci/pci.h | 2 ++ > include/linux/pci.h | 6 ------ > 3 files changed, 2 insertions(+), 8 deletions(-) > > > Notice: > a) Currently the PCI save/restore is broken. The reason is because pci_restore_msi_state() in "drivers/pci/msi.c" will try to restore the MSI config depends on device''s msi msg information (i.e. content of msi_desc->msg). However, that information is wrong in Xen environment because Xen HV owns MSI. I''m still trying to find a method to achieve the save/restore without touch the common msi.c code, any suggestion is welcome. >You mean because it tries to directly write to pci config space, or that it writes the wrong thing there? What will this affect? Dom0 S3 suspend/resume? More?> b) I notice pci frontend is in a branch. But to support pci frontend without touch common PCI function is difficult, I''m still considering it. >OK.> c) I''m not sure if xen''s irq space should be same as pirq. Basically I think irq is dom0 internal structure, while PIRQ is interface between Xen HV/dom0. But seems current implementation think these two items are same. I didn''t try to change it, but that may need improvement.What do you mean by "Xen''s irq space"? Does it have an irq space that''s distinct from pirqs? Or do you mean "Linux''s irq space"? At the moment, a Linux irq == the gsi, which is a convention I kept for Xen interrupts, though there''s no very strong tie (identity_mapped_irq() determines where we bother with the identity mapping; only the legacy ISA interrupts are essential). Obviously this has no meaning with MSI interrupts. Or have I misunderstood you? (It would be easier to review the patches if they were inline, one per mail)> commit 702965d82162e07c0c2afbdddbbe9a0c9a1c599d Author: Jiang, Yunhong > <yunhong.jiang@intel.com> Date: Fri May 8 00:50:34 2009 +0800 Add MSI > support to Xen Dom0 It add some hook to arch specific MSI function, so > that it will a) allocate irq from Xen''s irq space b) allocate vector > through Xen''s map_irq hypercall. Currently Xen''s irq space function > has no chip_data function, I assume this should be ok since we Xen''s > IRQ is different chip with native IOAPIC. Signed-off-by: Jiang, > Yunhong <yunhong.jiang@intel.com> diff --git > a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h > index 0563fc6..a5d9027 100644 --- a/arch/x86/include/asm/xen/pci.h +++ > b/arch/x86/include/asm/xen/pci.h @@ -3,11 +3,28 @@ #ifdef > CONFIG_XEN_DOM0_PCI int xen_register_gsi(u32 gsi, int triggering, int > polarity); +int xen_create_msi_irq(struct pci_dev *dev, + unsigned int > want, + struct msi_desc *msidesc, + int type); +int > xen_destroy_irq(int irq); #else static inline int xen_register_gsi(u32 > gsi, int triggering, int polarity) { return -1; } + +int > xen_create_msi_irq(struct pci_dev *dev,static inline> + unsigned int want, + struct msi_desc *msidesc, + int type) +{ + > return -1; +} +int xen_destroy_irq(int irq)static inline> +{ + return -1; +} #endif #endif /* _ASM_X86_XEN_PCI_H */ diff --git > a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c > index b562550..d1e3408 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ > b/arch/x86/kernel/apic/io_apic.c @@ -66,6 +66,7 @@ #include > <asm/xen/hypervisor.h> #include <asm/apic.h> +#include <asm/xen/pci.h> > #define __apicdebuginit(type) static type __init @@ -3472,6 +3473,10 > @@ static int setup_msi_irq(struct pci_dev *dev, struct msi_desc > *msidesc, int irq) return ret; set_irq_msi(irq, msidesc); + + if > (xen_domain()) + return 0;I think it would be cleaner to have a xen_setup_msi_irq() which just does the ret = msi_compose_msg(dev, irq, &msg); if (ret < 0) return ret; set_irq_msi(irq, msidesc); parts then returns, and put the if (xen) logic at the callsite (either with an explicit test, or add an ops vec of some kind.> + write_msi_msg(irq, &msg); if (irq_remapped(irq)) { @@ -3505,9 > +3510,14 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int > type) irq_want = nr_irqs_gsi; sub_handle = 0; > list_for_each_entry(msidesc, &dev->msi_list, list) { - irq = > create_irq_nr(irq_want); - if (irq == 0) - return -1; + if ((irq = > xen_create_msi_irq(dev, irq_want, msidesc, type)) > 0) + { + > dev_printk(KERN_DEBUG, "xen create irq %x for msi\n", irq); + } else { > + irq = create_irq_nr(irq_want); + if (irq == 0) + return -1; + }Hoist this out into a create_msi_irq() which does the native vs Xen thing appropriately.> irq_want = irq + 1; if (!intr_remapping_enabled) goto no_ir; @@ > -3544,13 +3554,15 @@ no_ir: return 0; error: - destroy_irq(irq); + if > (xen_destroy_irq(irq)) + destroy_irq(irq);Hoist this out too. (destroy_msi_irq()?)> return ret; }Do we (or will we ever) support the interrupt remapping stuff in the dom0 kernel, or is that something that might happen within Xen? (I don''t know anything about it.) If not, then it looks like there isn''t very much common code between the native and Xen versions of arch_setup_msi_irqs(). I think it might be overall cleaner just to explicitly have two versions, with arch_setup_msi_irqs() being a wrapper to choose which one to use. Even if we do support interrupt remapping, it wouldn''t result in much duplicated code at all, as far as I can see.> void arch_teardown_msi_irq(unsigned int irq) { - destroy_irq(irq); + > if (xen_destroy_irq(irq)) + destroy_irq(irq);Yes, this little pattern should be put into a single place.> } #if defined (CONFIG_DMAR) || defined (CONFIG_INTR_REMAP) diff --git > a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 496f07d..2f35a2d > 100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -1,7 > +1,6 @@ #include <linux/kernel.h> #include <linux/threads.h> #include > <linux/bitmap.h> -#include <linux/pci.h> #include <asm/io_apic.h> > #include <asm/acpi.h> diff --git a/drivers/xen/events.c > b/drivers/xen/events.c index af2aad4..65e4c7a 100644 --- > a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -28,6 +28,9 @@ > #include <linux/string.h> #include <linux/bootmem.h> #include > <linux/irqnr.h> +#include <linux/pci_regs.h> +#include <linux/pci.h> > +#include <linux/msi.h> #include <asm/ptrace.h> #include <asm/irq.h> > @@ -560,14 +563,98 @@ int xen_allocate_pirq(unsigned gsi, char *name) > if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) { > dynamic_irq_cleanup(irq); irq = -ENOSPC; + goto out; + } + + > irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); +out: + > spin_unlock(&irq_mapping_update_lock); + return irq; +} + +int > xen_destroy_irq(int irq) +{ + struct physdev_unmap_pirq unmap_irq; + > int rc; + + if (!xen_domain()) + return -1; + + unmap_irq.pirq = irq; > + unmap_irq.domid = DOMID_SELF; + if ((rc = > HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq))) + { + > printk(KERN_WARNING "unmap irq failed %x\n", rc); goto out; } - > irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + irq_info[irq] = > mk_unbound_info(); + + dynamic_irq_cleanup(irq); + return 0; out: - > spin_unlock(&irq_mapping_update_lock); + return -1; +} + +int > xen_create_msi_irq(struct pci_dev *dev, + unsigned int want,Do we need this param? Looks unused.> + struct msi_desc *msidesc, + int type) +{ + int irq = 0; + struct > physdev_map_pirq map_irq; + int rc; + domid_t domid = DOMID_SELF; + > int pos; + u32 table_offset, bir; + if (!xen_domain()) + return -1; + > + memset(&map_irq, 0, sizeof(map_irq)); + map_irq.domid = domid; + > map_irq.type = MAP_PIRQ_TYPE_MSI; + map_irq.index = -1; + map_irq.bus > = dev->bus->number; + map_irq.devfn = dev->devfn; + + if (type == > PCI_CAP_ID_MSIX) + { + pos = pci_find_capability(dev, > PCI_CAP_ID_MSIX); + +#define msix_table_offset_reg(base) (base + 0x04)Isn''t this defined somewhere else? If not, is there a better place to define it?> + pci_read_config_dword(dev, msix_table_offset_reg(pos), > &table_offset); + bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK); + > + map_irq.table_base = pci_resource_start(dev, bir); + > map_irq.entry_nr = msidesc->msi_attrib.entry_nr; + } + + > spin_lock(&irq_mapping_update_lock); + + irq = find_unbound_irq(); + + > if (irq == -1) + goto out; + + map_irq.pirq = irq; + + if ((rc = > HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq))) + { + > printk(KERN_WARNING "xen map irq failed %x\n", rc); + > dynamic_irq_cleanup(irq); + irq = -1; + goto out; + } + + > irq_info[irq] = mk_pirq_info(0, -1, map_irq.index); + > set_irq_chip_and_handler_name(irq, &xen_pirq_chip, + handle_level_irq, > + (type == PCI_CAP_ID_MSIX) ? "msi-x":"msi"); + +out: + > spin_unlock(&irq_mapping_update_lock); return irq; } diff --git > a/include/xen/interface/physdev.h b/include/xen/interface/physdev.h > index cd69391..47ab7f7 100644 --- a/include/xen/interface/physdev.h > +++ b/include/xen/interface/physdev.h @@ -121,6 +121,65 @@ struct > physdev_op { } u; }; + +#define MAP_PIRQ_TYPE_MSI 0x0 +#define > MAP_PIRQ_TYPE_GSI 0x1 +#define MAP_PIRQ_TYPE_UNKNOWN 0x2 + +#define > PHYSDEVOP_map_pirq 13 +struct physdev_map_pirq { + domid_t domid; + /* > IN */ + int type; + /* IN */ + int index; + /* IN or OUT */ + int > pirq; + /* IN */ + int bus; + /* IN */ + int devfn; + /* IN */ + int > entry_nr; + /* IN */ + uint64_t table_base; +}; + +#define > PHYSDEVOP_unmap_pirq 14 +struct physdev_unmap_pirq { + domid_t domid; > + /* IN */ + int pirq; +}; + +#define PHYSDEVOP_manage_pci_add 15 > +#define PHYSDEVOP_manage_pci_remove 16 +struct physdev_manage_pci { + > /* IN */ + uint8_t bus; + uint8_t devfn; +}; + +#define > PHYSDEVOP_restore_msi 19 +struct physdev_restore_msi { + /* IN */ + > uint8_t bus; + uint8_t devfn; +}; + +#define > PHYSDEVOP_manage_pci_add_ext 20 +struct physdev_manage_pci_ext { + /* > IN */ + uint8_t bus; + uint8_t devfn; + unsigned is_extfn; + unsigned > is_virtfn; + struct { + uint8_t bus; + uint8_t devfn; + } physfn; +}; > + /* * Notify that some PIRQ-bound event channels have been unmasked. > * ** This command is obsolete since interface version 0x00030202 and > is **> commit 134186e0b4526908b4c64c1454caff5db6ddf972 Author: Jiang, Yunhong > <yunhong.jiang@intel.com> Date: Thu May 7 21:22:26 2009 +0800 Add the > pci wrap function, to notify Xen of all PCI information. Currently Xen > depends on Dom0 to notify all PCI information. When Xen try to setup > MSI for a pci device, it will depends on this information. This method > need more discussion, since it add some ugly hook to pci_bus_type.Yes, this is a bit unpleasant. I can''t see any immediately satisfying solution, partly because I don''t fully understand what needs to be done in these hooks. Why does it need to do this at probe time rather than when setting up the interrupts?> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> diff --git > a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index f0d1a89..372ad9e > 100644 --- a/arch/x86/xen/Makefile +++ b/arch/x86/xen/Makefile @@ > -13,4 +13,4 @@ obj-$(CONFIG_SMP) += smp.o spinlock.o > obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o obj-$(CONFIG_XEN_DOM0) += > vga.o apic.o obj-$(CONFIG_PCI_XEN) += pci-swiotlb.o > -obj-$(CONFIG_XEN_DOM0_PCI) += pci.o \ No newline at end of file > +obj-$(CONFIG_XEN_DOM0_PCI) += pci.o pci_wrap.o diff --git > a/arch/x86/xen/pci_wrap.c b/arch/x86/xen/pci_wrap.c new file mode > 100644 index 0000000..4ab6966 --- /dev/null +++ > b/arch/x86/xen/pci_wrap.c @@ -0,0 +1,90 @@ +/* + * > vim:shiftwidth=8:noexpandtab + */ + +#include <linux/kernel.h> > +#include <linux/init.h> +#include <linux/pci.h> +#include > <xen/interface/physdev.h> +#include <asm/xen/hypercall.h> +#include > <asm/xen/hypervisor.h> + +static int (*pci_bus_probe)(struct device > *dev); +static int (*pci_bus_remove)(struct device *dev);I''d give these more distinctive names: orig_pci_bus_probe, or something, and call them with (*orig_pci_bus_probe)(...) to make it obvious they''re function callers. I overlooked the calls the first couple of times because they didn''t stand out.> +static int pci_ari_enabled(struct pci_bus *bus) +{ + return bus->self > && bus->self->ari_enabled; +}Is this a generally useful predicate?> + +static int pci_bus_probe_wrapper(struct device *dev) +{ + int r; + > struct pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci > manage_pci; + struct physdev_manage_pci_ext manage_pci_ext; + +#ifdef > CONFIG_PCI_IOV + if (pci_dev->is_virtfn) { + memset(&manage_pci_ext, > 0, sizeof(manage_pci_ext)); + manage_pci_ext.bus = > pci_dev->bus->number; + manage_pci_ext.devfn = pci_dev->devfn; + > manage_pci_ext.is_virtfn = 1; + manage_pci_ext.physfn.bus = > pci_dev->physfn->bus->number; + manage_pci_ext.physfn.devfn = > pci_dev->physfn->devfn; + r = > HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, + > &manage_pci_ext); + } else +#endif + if (pci_ari_enabled(pci_dev->bus) > && PCI_SLOT(pci_dev->devfn)) { + memset(&manage_pci_ext, 0, > sizeof(manage_pci_ext)); + manage_pci_ext.bus = pci_dev->bus->number; > + manage_pci_ext.devfn = pci_dev->devfn; + manage_pci_ext.is_extfn = > 1; + r = HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, + > &manage_pci_ext); + } else { + manage_pci.bus = pci_dev->bus->number; > + manage_pci.devfn = pci_dev->devfn; + r = > HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add, + &manage_pci); + } + > if (r && r != -ENOSYS) + return r;Why is ENOSYS OK? Doesn''t it mean that Xen is missing some aspect of MSI support?> + + r = pci_bus_probe(dev); + return r; +} + +static int > pci_bus_remove_wrapper(struct device *dev) +{ + int r; + struct > pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci > manage_pci; + manage_pci.bus = pci_dev->bus->number; + > manage_pci.devfn = pci_dev->devfn; + + r = pci_bus_remove(dev); + /* > dev and pci_dev are no longer valid!! */ + + > WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove, + > &manage_pci));I prefer: if (... != 0) WARN_ON(1); to make it obvious that we''re doing an error check, rather than some internal consistency assertion.> + return r; +} + +static int __init hook_pci_bus(void) +{ + + if > (!xen_domain()) + return 0; + + pci_bus_probe = pci_bus_type.probe; + > pci_bus_type.probe = pci_bus_probe_wrapper; + + pci_bus_remove = > pci_bus_type.remove; + pci_bus_type.remove = pci_bus_remove_wrapper; + > + return 0; +} + +core_initcall(hook_pci_bus);_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy, thanks for your detailed comments, I will update according to your feedback. See comments embeded please. --jyh Jeremy Fitzhardinge wrote:> Jiang, Yunhong wrote: >> Jeremy, attached is basic MSI support to PV_dom0. Please have a >> look on it. >> >> Thanks >> Yunhong Jiang >> >> The pci_wrapper.patch add some hook to pci_bus_type, so that > Xen will be notified when a PCI device is added to system. >> Makefile | 2 - >> pci_wrap.c | 90 > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 91 insertions(+), 1 deletion(-) >> >> The enable_msi.patch add the MSI support to dom0, basically > it just allocate irq from Xen irq name space. >> >> arch/x86/include/asm/xen/pci.h | 17 +++++++ >> arch/x86/kernel/apic/io_apic.c | 22 +++++++-- >> arch/x86/xen/apic.c | 1 >> drivers/xen/events.c | 91 > +++++++++++++++++++++++++++++++++++++++- >> include/xen/interface/physdev.h | 59 +++++++++++++++++++++++++ >> 5 files changed, 182 insertions(+), 8 deletions(-) >> >> The reenable_msi.patch just remove original function that > disable MSI in xen environment. >> >> arch/x86/xen/apic.c | 2 -- >> drivers/pci/pci.h | 2 ++ >> include/linux/pci.h | 6 ------ >> 3 files changed, 2 insertions(+), 8 deletions(-) >> >> >> Notice: >> a) Currently the PCI save/restore is broken. The reason is > because pci_restore_msi_state() in "drivers/pci/msi.c" will > try to restore the MSI config depends on device''s msi msg > information (i.e. content of msi_desc->msg). However, that > information is wrong in Xen environment because Xen HV owns > MSI. I''m still trying to find a method to achieve the > save/restore without touch the common msi.c code, any > suggestion is welcome. >> > > You mean because it tries to directly write to pci config > space, or that > it writes the wrong thing there?Because It will try to write to the pci config space directly, and currently it will write a wrong value. I''m considering to overwrite msi_compose_msg() in arch/x86/kernel/apic/io_apic.c, so that msi_desc->msg will hold the value Xen is using, I hope it will resolve the issue.> > What will this affect? Dom0 S3 suspend/resume? More?Yes, suspend/resume.> >> b) I notice pci frontend is in a branch. But to support pci > frontend without touch common PCI function is difficult, I''m > still considering it. >> > > OK. > >> c) I''m not sure if xen''s irq space should be same as pirq. > Basically I think irq is dom0 internal structure, while PIRQ > is interface between Xen HV/dom0. But seems current > implementation think these two items are same. I didn''t try to > change it, but that may need improvement. > > What do you mean by "Xen''s irq space"? Does it have an irq > space that''s > distinct from pirqs? Or do you mean "Linux''s irq space"? At theAha, yes, I mean linux''s irq space.> moment, a Linux irq == the gsi, which is a convention I kept for XenI suspect if this will always work, because 1) It may not work on x86_32, because gsi is compressed in x86_32, or we will not support that? 2) Even in x86_64, I suspect it may not work still. Currently Xen defined NR_IRQS as 256, unless specified in Rules.mk. So what will happen if the gsi > 256? I suspect the PHYSDEVOP_alloc_irq_vector hypercall will fail. The reason is, the pirq is virtual interrupt line between Xen/guest, and is determined by Xen HV, while irq/gsi is defined by guest/dom0. Current Xen''s dom0 has a mapping between pirq/irq, so I''m not sure if that logic is needed in pv_dom0 still.> interrupts, though there''s no very strong tie (identity_mapped_irq() > determines where we bother with the identity mapping; only the legacy > ISA interrupts are essential). Obviously this has no meaning with > MSI interrupts. > > Or have I misunderstood you? > > (It would be easier to review the patches if they were inline, > one per mail)Sorry, I will do it next time.> >> commit 702965d82162e07c0c2afbdddbbe9a0c9a1c599d Author: Jiang, >> Yunhong <yunhong.jiang@intel.com> Date: Fri May 8 00:50:34 2009 >> +0800 Add MSI support to Xen Dom0 It add some hook to arch specific >> MSI function, so that it will a) allocate irq from Xen''s irq space >> b) allocate vector through Xen''s map_irq hypercall. Currently Xen''s >> irq space function has no chip_data function, I assume this should >> be ok since we Xen''s IRQ is different chip with native IOAPIC. >> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> diff --git >> a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h >> index 0563fc6..a5d9027 100644 --- > a/arch/x86/include/asm/xen/pci.h +++ >> b/arch/x86/include/asm/xen/pci.h @@ -3,11 +3,28 @@ #ifdef >> CONFIG_XEN_DOM0_PCI int xen_register_gsi(u32 gsi, int triggering, int >> polarity); +int xen_create_msi_irq(struct pci_dev *dev, + unsigned >> int want, + struct msi_desc *msidesc, + int type); +int >> xen_destroy_irq(int irq); #else static inline int >> xen_register_gsi(u32 gsi, int triggering, int polarity) { return -1; >> } + +int xen_create_msi_irq(struct pci_dev *dev, > static inline >> + unsigned int want, + struct msi_desc *msidesc, + int type) +{ + >> return -1; +} +int xen_destroy_irq(int irq) static inline >> +{ + return -1; +} #endif #endif /* _ASM_X86_XEN_PCI_H */ diff --git >> a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c >> index b562550..d1e3408 100644 --- > a/arch/x86/kernel/apic/io_apic.c +++ >> b/arch/x86/kernel/apic/io_apic.c @@ -66,6 +66,7 @@ #include >> <asm/xen/hypervisor.h> #include <asm/apic.h> +#include >> <asm/xen/pci.h> #define __apicdebuginit(type) static type __init @@ >> -3472,6 +3473,10 @@ static int setup_msi_irq(struct pci_dev *dev, >> struct msi_desc *msidesc, int irq) return ret; set_irq_msi(irq, >> msidesc); + + if (xen_domain()) + return 0; > > I think it would be cleaner to have a xen_setup_msi_irq() > which just does the > ret = msi_compose_msg(dev, irq, &msg); > if (ret < 0) > return ret; > > set_irq_msi(irq, msidesc); > > parts then returns, and put the if (xen) logic at the callsite > (either with an explicit test, or add an ops vec of some kind.Sure, I will do like that. In fact, my draft implemented this way. BTW, what do you mean of "an ops vec"???> >> + write_msi_msg(irq, &msg); if (irq_remapped(irq)) { @@ -3505,9 >> +3510,14 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, >> int type) irq_want = nr_irqs_gsi; sub_handle = 0; >> list_for_each_entry(msidesc, &dev->msi_list, list) { - irq >> create_irq_nr(irq_want); - if (irq == 0) - return -1; + if ((irq >> xen_create_msi_irq(dev, irq_want, msidesc, type)) > 0) + { + >> dev_printk(KERN_DEBUG, "xen create irq %x for msi\n", irq); + } else >> { + irq = create_irq_nr(irq_want); + if (irq == 0) + return -1; + } > Hoist this out into a create_msi_irq() which does the native > vs Xen thing > appropriately.Sure.>> irq_want = irq + 1; if (!intr_remapping_enabled) goto no_ir; @@ >> -3544,13 +3554,15 @@ no_ir: return 0; error: - destroy_irq(irq); + if >> (xen_destroy_irq(irq)) + destroy_irq(irq); > Hoist this out too. (destroy_msi_irq()?)Sure.>> return ret; } > > Do we (or will we ever) support the interrupt remapping stuff > in the dom0 kernel, > or is that something that might happen within Xen? (I don''t > know anything about > it.)Per my understanding, interrupt remapping should happen within Xen.> > If not, then it looks like there isn''t very much common code > between the native > and Xen versions of arch_setup_msi_irqs(). I think it might > be overall cleaner > just to explicitly have two versions, with > arch_setup_msi_irqs() being a wrapper to > choose which one to use. Even if we do support interrupt > remapping, it wouldn''t > result in much duplicated code at all, as far as I can see.So do you mean something like arch_setup_msi_irqs(...) { if (xen_domain) xen_setup_msi_irqs() ..... same as native }> >> void arch_teardown_msi_irq(unsigned int irq) { - destroy_irq(irq); + >> if (xen_destroy_irq(irq)) + destroy_irq(irq); > > Yes, this little pattern should be put into a single place.What do you mean of put into a single place?> >> } #if defined (CONFIG_DMAR) || defined (CONFIG_INTR_REMAP) diff --git >> a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c index 496f07d..2f35a2d >> 100644 --- a/arch/x86/xen/apic.c +++ b/arch/x86/xen/apic.c @@ -1,7 >> +1,6 @@ #include <linux/kernel.h> #include <linux/threads.h> #include >> <linux/bitmap.h> -#include <linux/pci.h> #include <asm/io_apic.h> >> #include <asm/acpi.h> diff --git a/drivers/xen/events.c >> b/drivers/xen/events.c index af2aad4..65e4c7a 100644 --- >> a/drivers/xen/events.c +++ b/drivers/xen/events.c @@ -28,6 +28,9 @@ >> #include <linux/string.h> #include <linux/bootmem.h> #include >> <linux/irqnr.h> +#include <linux/pci_regs.h> +#include <linux/pci.h> >> +#include <linux/msi.h> #include <asm/ptrace.h> #include <asm/irq.h> >> @@ -560,14 +563,98 @@ int xen_allocate_pirq(unsigned gsi, char *name) >> if (HYPERVISOR_physdev_op(PHYSDEVOP_alloc_irq_vector, &irq_op)) { >> dynamic_irq_cleanup(irq); irq = -ENOSPC; + goto out; + } + + >> irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); +out: + >> spin_unlock(&irq_mapping_update_lock); + return irq; +} + +int >> xen_destroy_irq(int irq) +{ + struct physdev_unmap_pirq unmap_irq; + >> int rc; + + if (!xen_domain()) + return -1; + + unmap_irq.pirq = irq; >> + unmap_irq.domid = DOMID_SELF; + if ((rc >> HYPERVISOR_physdev_op(PHYSDEVOP_unmap_pirq, &unmap_irq))) + { + >> printk(KERN_WARNING "unmap irq failed %x\n", rc); goto out; } - >> irq_info[irq] = mk_pirq_info(0, gsi, irq_op.vector); + irq_info[irq] >> = mk_unbound_info(); + + dynamic_irq_cleanup(irq); + return 0; out: - >> spin_unlock(&irq_mapping_update_lock); + return -1; +} + +int >> xen_create_msi_irq(struct pci_dev *dev, + unsigned int want, > > Do we need this param? Looks unused.I wil change it. I just wanted to keep it similar as native create_irq_nr() function.> >> + struct msi_desc *msidesc, + int type) +{ + int irq = 0; + struct >> physdev_map_pirq map_irq; + int rc; + domid_t domid = DOMID_SELF; + >> int pos; + u32 table_offset, bir; + if (!xen_domain()) + return -1; + >> + memset(&map_irq, 0, sizeof(map_irq)); + map_irq.domid = domid; + >> map_irq.type = MAP_PIRQ_TYPE_MSI; + map_irq.index = -1; + map_irq.bus >> = dev->bus->number; + map_irq.devfn = dev->devfn; + + if (type =>> PCI_CAP_ID_MSIX) + { + pos = pci_find_capability(dev, >> PCI_CAP_ID_MSIX); + +#define msix_table_offset_reg(base) > (base + 0x04) > Isn''t this defined somewhere else? If not, is there a better > place to define it?It is defined in drivers/pci/msi.h. As I don''t want to touch anything in drivers/pci/ directory, so I put it here (sorry that I should place it under some .h file).>> + pci_read_config_dword(dev, msix_table_offset_reg(pos), >> &table_offset); + bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK); >> + + map_irq.table_base = pci_resource_start(dev, bir); + >> map_irq.entry_nr = msidesc->msi_attrib.entry_nr; + } + + >> spin_lock(&irq_mapping_update_lock); + + irq = find_unbound_irq(); + >> + if (irq == -1) + goto out; + + map_irq.pirq = irq; + + if ((rc >> HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq))) + { + >> printk(KERN_WARNING "xen map irq failed %x\n", rc); + >> dynamic_irq_cleanup(irq); + irq = -1; + goto out; + } + + >> irq_info[irq] = mk_pirq_info(0, -1, map_irq.index); + >> set_irq_chip_and_handler_name(irq, &xen_pirq_chip, + >> handle_level_irq, + (type == PCI_CAP_ID_MSIX) ? "msi-x":"msi"); + >> +out: + spin_unlock(&irq_mapping_update_lock); return irq; } diff >> --git a/include/xen/interface/physdev.h >> b/include/xen/interface/physdev.h index cd69391..47ab7f7 100644 --- >> a/include/xen/interface/physdev.h +++ >> b/include/xen/interface/physdev.h @@ -121,6 +121,65 @@ struct >> physdev_op { } u; }; + +#define MAP_PIRQ_TYPE_MSI 0x0 +#define >> MAP_PIRQ_TYPE_GSI 0x1 +#define MAP_PIRQ_TYPE_UNKNOWN 0x2 + +#define >> PHYSDEVOP_map_pirq 13 +struct physdev_map_pirq { + domid_t domid; + >> /* IN */ + int type; + /* IN */ + int index; + /* IN or OUT */ + int >> pirq; + /* IN */ + int bus; + /* IN */ + int devfn; + /* IN */ + int >> entry_nr; + /* IN */ + uint64_t table_base; +}; + +#define >> PHYSDEVOP_unmap_pirq 14 +struct physdev_unmap_pirq { + domid_t >> domid; + /* IN */ + int pirq; +}; + +#define >> PHYSDEVOP_manage_pci_add 15 +#define PHYSDEVOP_manage_pci_remove 16 >> +struct physdev_manage_pci { + /* IN */ + uint8_t bus; + uint8_t >> devfn; +}; + +#define PHYSDEVOP_restore_msi 19 +struct >> physdev_restore_msi { + /* IN */ + uint8_t bus; + uint8_t devfn; +}; >> + +#define PHYSDEVOP_manage_pci_add_ext 20 +struct >> physdev_manage_pci_ext { + /* IN */ + uint8_t bus; + uint8_t devfn; >> + unsigned is_extfn; + unsigned is_virtfn; + struct { + uint8_t bus; >> + uint8_t devfn; + } physfn; +}; + /* * Notify that some PIRQ-bound >> event channels have been unmasked. * ** This command is obsolete >> since interface version 0x00030202 and is ** > > >> commit 134186e0b4526908b4c64c1454caff5db6ddf972 Author: Jiang, >> Yunhong <yunhong.jiang@intel.com> Date: Thu May 7 21:22:26 2009 >> +0800 Add the pci wrap function, to notify Xen of all PCI >> information. Currently Xen depends on Dom0 to notify all PCI >> information. When Xen try to setup MSI for a pci device, it will >> depends on this information. This method need more discussion, since >> it add some ugly hook to pci_bus_type. > > Yes, this is a bit unpleasant. I can''t see any immediately > satisfying solution, partly because > I don''t fully understand what needs to be done in these hooks. > Why does it need to do this at probe > time rather than when setting up the interrupts?The main purpose of this hook is to keep Xen have the list of PCI device in the system, include those hot plug/unpluged devices, before the device begin function, I think the main purpose is for VT-d support in Xen, so it is in probe time. In fact, MSI should works without this list. Unfortunately, currently Xen will check if the device exists or not when enabling MSI/MSI-X, and will return -ENODEV if the device is not in the list. But I do think this code is very ugly, and I suspect if we can push this code into upstream successfully. I know Winston is working on PCI hotplug in Xen side, I will talk with him to see if any plan to change this logic.> >> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> diff --git >> a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile index >> f0d1a89..372ad9e 100644 --- a/arch/x86/xen/Makefile +++ >> b/arch/x86/xen/Makefile @@ -13,4 +13,4 @@ obj-$(CONFIG_SMP) += smp.o >> spinlock.o obj-$(CONFIG_XEN_DEBUG_FS) += debugfs.o >> obj-$(CONFIG_XEN_DOM0) += vga.o apic.o obj-$(CONFIG_PCI_XEN) +>> pci-swiotlb.o -obj-$(CONFIG_XEN_DOM0_PCI) += pci.o \ No newline at >> end of file +obj-$(CONFIG_XEN_DOM0_PCI) += pci.o pci_wrap.o diff >> --git a/arch/x86/xen/pci_wrap.c b/arch/x86/xen/pci_wrap.c new file >> mode 100644 index 0000000..4ab6966 --- /dev/null +++ >> b/arch/x86/xen/pci_wrap.c @@ -0,0 +1,90 @@ +/* + * >> vim:shiftwidth=8:noexpandtab + */ + +#include <linux/kernel.h> >> +#include <linux/init.h> +#include <linux/pci.h> +#include >> <xen/interface/physdev.h> +#include <asm/xen/hypercall.h> +#include >> <asm/xen/hypervisor.h> + +static int (*pci_bus_probe)(struct device >> *dev); +static int (*pci_bus_remove)(struct device *dev); > > I''d give these more distinctive names: orig_pci_bus_probe, or > something, and call them with (*orig_pci_bus_probe)(...) to make it > obvious they''re > function callers. I overlooked the calls the first couple of > times because > they didn''t stand out. > >> +static int pci_ari_enabled(struct pci_bus *bus) +{ + return >> bus->self && bus->self->ari_enabled; +} > > Is this a generally useful predicate?I just copied from Xen''s dom0 code. I will check it .> >> + +static int pci_bus_probe_wrapper(struct device *dev) +{ + int r; + >> struct pci_dev *pci_dev = to_pci_dev(dev); + struct >> physdev_manage_pci manage_pci; + struct physdev_manage_pci_ext >> manage_pci_ext; + +#ifdef CONFIG_PCI_IOV + if (pci_dev->is_virtfn) { >> + memset(&manage_pci_ext, 0, sizeof(manage_pci_ext)); + >> manage_pci_ext.bus = pci_dev->bus->number; + manage_pci_ext.devfn >> pci_dev->devfn; + manage_pci_ext.is_virtfn = 1; + >> manage_pci_ext.physfn.bus = pci_dev->physfn->bus->number; + >> manage_pci_ext.physfn.devfn = pci_dev->physfn->devfn; + r >> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, + >> &manage_pci_ext); + } else +#endif + if >> (pci_ari_enabled(pci_dev->bus) && PCI_SLOT(pci_dev->devfn)) { + >> memset(&manage_pci_ext, 0, sizeof(manage_pci_ext)); + >> manage_pci_ext.bus = pci_dev->bus->number; + manage_pci_ext.devfn >> pci_dev->devfn; + manage_pci_ext.is_extfn = 1; + r >> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add_ext, + >> &manage_pci_ext); + } else { + manage_pci.bus >> pci_dev->bus->number; + manage_pci.devfn = pci_dev->devfn; + r >> HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_add, + &manage_pci); + } >> + if (r && r != -ENOSYS) + return r; > > Why is ENOSYS OK? Doesn''t it mean that Xen is missing some > aspect of MSI support?I''m not sure if that is for backward compatibility? I just copied this from current Xen''s dom0 code.> >> + + r = pci_bus_probe(dev); + return r; +} + +static int >> pci_bus_remove_wrapper(struct device *dev) +{ + int r; + struct >> pci_dev *pci_dev = to_pci_dev(dev); + struct physdev_manage_pci >> manage_pci; + manage_pci.bus = pci_dev->bus->number; + >> manage_pci.devfn = pci_dev->devfn; + + r = pci_bus_remove(dev); + /* >> dev and pci_dev are no longer valid!! */ + + >> WARN_ON(HYPERVISOR_physdev_op(PHYSDEVOP_manage_pci_remove, + >> &manage_pci)); > I prefer: > if (... != 0) > WARN_ON(1); > to make it obvious that we''re doing an error check, rather than some > internal consistency assertion. > >> + return r; +} + +static int __init hook_pci_bus(void) +{ + + if >> (!xen_domain()) + return 0; + + pci_bus_probe = pci_bus_type.probe; + >> pci_bus_type.probe = pci_bus_probe_wrapper; + + pci_bus_remove >> pci_bus_type.remove; + pci_bus_type.remove = pci_bus_remove_wrapper; >> + + return 0; +} + +core_initcall(hook_pci_bus);_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-May-13 17:31 UTC
Re: [Xen-devel] [PATCH]Add MSI support to PV_dom0
Jiang, Yunhong wrote:>> moment, a Linux irq == the gsi, which is a convention I kept for Xen >> > > I suspect if this will always work, because > 1) It may not work on x86_32, because gsi is compressed in x86_32, or we will not support that? >Yinghai Lu has posted patches to remove gsi compression, now that dynamic irq arrays mean we have no practical limit on irq numbers.> 2) Even in x86_64, I suspect it may not work still. Currently Xen defined NR_IRQS as 256, unless specified in Rules.mk. So what will happen if the gsi > 256? I suspect the PHYSDEVOP_alloc_irq_vector hypercall will fail. > > The reason is, the pirq is virtual interrupt line between Xen/guest, and is determined by Xen HV, while irq/gsi is defined by guest/dom0. Current Xen''s dom0 has a mapping between pirq/irq, so I''m not sure if that logic is needed in pv_dom0 still. >Well, we could decouple Xen pirq from Linux irq fairly easily if needed. Or just limit the range of identity mapped irqs (by adjusting get_nr_hw_irqs()/identity_mapped_irq()) so that the common case will still get identities but we can leave room in the sub-256 irq space for pirqs.> Sure, I will do like that. In fact, my draft implemented this way. > BTW, what do you mean of "an ops vec"??? >A struct msi_ops (or something) with function pointers to appropriate functions. But I think a better approach is to just pull everything out into xen_setup_msi_irqs().>>> void arch_teardown_msi_irq(unsigned int irq) { - destroy_irq(irq); + >>> if (xen_destroy_irq(irq)) + destroy_irq(irq); >>> >> Yes, this little pattern should be put into a single place. >> > > What do you mean of put into a single place? >Have a destroy_msi_irq() which contains this logic, and call it from whenever it is needed.>> (base + 0x04) >> Isn''t this defined somewhere else? If not, is there a better >> place to define it? >> > > It is defined in drivers/pci/msi.h. As I don''t want to touch anything in drivers/pci/ directory, so I put it here (sorry that I should place it under some .h file). >It is always best to put generically useful stuff in the most appropriate common place.>>> commit 134186e0b4526908b4c64c1454caff5db6ddf972 Author: Jiang, >>> Yunhong <yunhong.jiang@intel.com> Date: Thu May 7 21:22:26 2009 >>> +0800 Add the pci wrap function, to notify Xen of all PCI >>> information. Currently Xen depends on Dom0 to notify all PCI >>> information. When Xen try to setup MSI for a pci device, it will >>> depends on this information. This method need more discussion, since >>> it add some ugly hook to pci_bus_type. >>> >> Yes, this is a bit unpleasant. I can''t see any immediately >> satisfying solution, partly because >> I don''t fully understand what needs to be done in these hooks. >> Why does it need to do this at probe >> time rather than when setting up the interrupts? >> > > The main purpose of this hook is to keep Xen have the list of PCI device in the system, include those hot plug/unpluged devices, before the device begin function, I think the main purpose is for VT-d support in Xen, so it is in probe time. > > In fact, MSI should works without this list. Unfortunately, currently Xen will check if the device exists or not when enabling MSI/MSI-X, and will return -ENODEV if the device is not in the list. >Well, couldn''t we quietly add it to the list just before setting up the first interrupt for the device? I guess that wouldn''t work for VT-d support, unless we can defer it until we first see the device in some Xen-specific code. Matthew, do you have any thoughts about a cleaner way to hook into probe/remove? Would a arch_pci_device_probe() hook in pci_device_probe() be a reasonable way to handle it?> But I do think this code is very ugly, and I suspect if we can push this code into upstream successfully. I know Winston is working on PCI hotplug in Xen side, I will talk with him to see if any plan to change this logic. >Either way we''re eventually going to need something like it for VT-d, won''t we?>>> +static int pci_ari_enabled(struct pci_bus *bus) +{ + return >>> bus->self && bus->self->ari_enabled; +} >>> >> Is this a generally useful predicate? >> > > I just copied from Xen''s dom0 code. I will check it . >It looks like something that could be in a common pci header. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel