Qing He
2009-Jan-08 09:06 UTC
[Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
This patchset enables MSI-INTx interrupt translation for HVM domains. The intention of the patch is to use MSI as the physical interrupt mechanism for passthrough device as much as possible, thus reducing the pirq sharing among domains. When MSI is globally enabled, if the device has the MSI capability but doesn''t used by the guest, hypervisor will try to user MSI as the underlying pirq and inject translated INTx irq to guest vioapic. When guest itself enabled MSI or MSI-X, the translation is automatically turned off. Add a config file option to disable/enable this feature. Also, in order to allow the user to override the option per device, a per-device option mechanism is implemented and an MSI-INTx option is added The patch set contains 6 patches and is organized as below: [PATCH 1/6] hypervisor patch *[PATCH 2/6] qemu patch [PATCH 3/6] add pci options in domain config file [PATCH 4/6] XenAPI version of the above patch [PATCH 5/6] add global and per-device option for MSI-INTx [PATCH 6/6] documentation and example * against ioemu-remote _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-08 09:06 UTC
[Xen-devel] [PATCH 1/6] passthrough: MSI-INTx translation for HVM
passthrough: MSI-INTx translation for HVM The patch adds a new type of pt_irq: PT_IRQ_TYPE_MSI_TRANSLATE. With it, guest dpci irq can now use msi as the underlying pirq while injected as INTx irq. Signed-off-by: Qing He <qing.he@intel.com> --- diff -r e2f36d066b7b -r d11b806ed94f tools/libxc/xc_domain.c --- a/tools/libxc/xc_domain.c Mon Dec 22 13:48:40 2008 +0000 +++ b/tools/libxc/xc_domain.c Sun Jan 04 17:41:53 2009 +0800 @@ -888,7 +888,8 @@ bind->hvm_domid = domid; bind->irq_type = irq_type; bind->machine_irq = machine_irq; - if ( irq_type == PT_IRQ_TYPE_PCI ) + if ( irq_type == PT_IRQ_TYPE_PCI || + irq_type == PT_IRQ_TYPE_MSI_TRANSLATE ) { bind->u.pci.bus = bus; bind->u.pci.device = device; diff -r e2f36d066b7b -r d11b806ed94f xen/arch/x86/hvm/vmsi.c --- a/xen/arch/x86/hvm/vmsi.c Mon Dec 22 13:48:40 2008 +0000 +++ b/xen/arch/x86/hvm/vmsi.c Sun Jan 04 17:41:53 2009 +0800 @@ -134,7 +134,7 @@ "vector=%x trig_mode=%x\n", dest, dest_mode, delivery_mode, vector, trig_mode); - if ( !test_bit(_HVM_IRQ_DPCI_MSI, &hvm_irq_dpci->mirq[pirq].flags) ) + if ( !( hvm_irq_dpci->mirq[pirq].flags & HVM_IRQ_DPCI_GUEST_MSI ) ) { gdprintk(XENLOG_WARNING, "pirq %x not msi \n", pirq); return 0; diff -r e2f36d066b7b -r d11b806ed94f xen/drivers/passthrough/io.c --- a/xen/drivers/passthrough/io.c Mon Dec 22 13:48:40 2008 +0000 +++ b/xen/drivers/passthrough/io.c Sun Jan 04 17:41:53 2009 +0800 @@ -23,6 +23,11 @@ #include <asm/hvm/irq.h> #include <asm/hvm/iommu.h> #include <xen/hvm/irq.h> + +static int pt_irq_need_timer(uint32_t flags) +{ + return !(flags & (HVM_IRQ_DPCI_GUEST_MSI | HVM_IRQ_DPCI_TRANSLATE)); +} static void pt_irq_time_out(void *data) { @@ -93,7 +98,8 @@ if ( !test_and_set_bit(pirq, hvm_irq_dpci->mapping)) { - set_bit(_HVM_IRQ_DPCI_MSI, &hvm_irq_dpci->mirq[pirq].flags); + hvm_irq_dpci->mirq[pirq].flags = HVM_IRQ_DPCI_MACH_MSI | + HVM_IRQ_DPCI_GUEST_MSI; hvm_irq_dpci->mirq[pirq].gmsi.gvec = pt_irq_bind->u.msi.gvec; hvm_irq_dpci->mirq[pirq].gmsi.gflags = pt_irq_bind->u.msi.gflags; hvm_irq_dpci->msi_gvec_pirq[pt_irq_bind->u.msi.gvec] = pirq; @@ -104,7 +110,7 @@ hvm_irq_dpci->msi_gvec_pirq[pt_irq_bind->u.msi.gvec] = 0; hvm_irq_dpci->mirq[pirq].gmsi.gflags = 0; hvm_irq_dpci->mirq[pirq].gmsi.gvec = 0; - clear_bit(_HVM_IRQ_DPCI_MSI, &hvm_irq_dpci->mirq[pirq].flags); + hvm_irq_dpci->mirq[pirq].flags = 0; clear_bit(pirq, hvm_irq_dpci->mapping); spin_unlock(&d->event_lock); return rc; @@ -150,17 +156,33 @@ if ( !test_and_set_bit(machine_gsi, hvm_irq_dpci->mapping)) { unsigned int vector = domain_irq_to_vector(d, machine_gsi); + unsigned int share; hvm_irq_dpci->mirq[machine_gsi].dom = d; + if ( pt_irq_bind->irq_type == PT_IRQ_TYPE_MSI_TRANSLATE ) + { + hvm_irq_dpci->mirq[machine_gsi].flags = HVM_IRQ_DPCI_MACH_MSI | + HVM_IRQ_DPCI_GUEST_PCI | + HVM_IRQ_DPCI_TRANSLATE; + share = 0; + } + else /* PT_IRQ_TYPE_PCI */ + { + hvm_irq_dpci->mirq[machine_gsi].flags = HVM_IRQ_DPCI_MACH_PCI | + HVM_IRQ_DPCI_GUEST_PCI; + share = BIND_PIRQ__WILL_SHARE; + } /* Init timer before binding */ - init_timer(&hvm_irq_dpci->hvm_timer[vector], - pt_irq_time_out, &hvm_irq_dpci->mirq[machine_gsi], 0); + if ( pt_irq_need_timer(hvm_irq_dpci->mirq[machine_gsi].flags) ) + init_timer(&hvm_irq_dpci->hvm_timer[vector], + pt_irq_time_out, &hvm_irq_dpci->mirq[machine_gsi], 0); /* Deal with gsi for legacy devices */ - rc = pirq_guest_bind(d->vcpu[0], machine_gsi, BIND_PIRQ__WILL_SHARE); + rc = pirq_guest_bind(d->vcpu[0], machine_gsi, share); if ( unlikely(rc) ) { - kill_timer(&hvm_irq_dpci->hvm_timer[vector]); + if ( pt_irq_need_timer(hvm_irq_dpci->mirq[machine_gsi].flags) ) + kill_timer(&hvm_irq_dpci->hvm_timer[vector]); hvm_irq_dpci->mirq[machine_gsi].dom = NULL; clear_bit(machine_gsi, hvm_irq_dpci->mapping); hvm_irq_dpci->girq[guest_gsi].machine_gsi = 0; @@ -237,7 +259,8 @@ if ( list_empty(&hvm_irq_dpci->mirq[machine_gsi].digl_list) ) { pirq_guest_unbind(d, machine_gsi); - kill_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, machine_gsi)]); + if ( pt_irq_need_timer(hvm_irq_dpci->mirq[machine_gsi].flags) ) + kill_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, machine_gsi)]); hvm_irq_dpci->mirq[machine_gsi].dom = NULL; hvm_irq_dpci->mirq[machine_gsi].flags = 0; clear_bit(machine_gsi, hvm_irq_dpci->mapping); @@ -267,7 +290,7 @@ * PIC) and we need to detect that. */ set_bit(mirq, dpci->dirq_mask); - if ( !test_bit(_HVM_IRQ_DPCI_MSI, &dpci->mirq[mirq].flags) ) + if ( pt_irq_need_timer(dpci->mirq[mirq].flags) ) set_timer(&dpci->hvm_timer[domain_irq_to_vector(d, mirq)], NOW() + PT_IRQ_TIME_OUT); vcpu_kick(d->vcpu[0]); @@ -276,35 +299,40 @@ } #ifdef SUPPORT_MSI_REMAPPING +/* called with d->event_lock held */ +static void __msi_pirq_eoi(struct domain *d, int pirq) +{ + struct hvm_irq_dpci *hvm_irq_dpci = d->arch.hvm_domain.irq.dpci; + irq_desc_t *desc; + + if ( ( pirq >= 0 ) && ( pirq < NR_IRQS ) && + test_bit(pirq, hvm_irq_dpci->mapping) && + ( hvm_irq_dpci->mirq[pirq].flags & HVM_IRQ_DPCI_MACH_MSI) ) + { + BUG_ON(!local_irq_is_enabled()); + desc = domain_spin_lock_irq_desc(d, pirq, NULL); + if ( !desc ) + return; + + desc->status &= ~IRQ_INPROGRESS; + spin_unlock_irq(&desc->lock); + + pirq_guest_eoi(d, pirq); + } +} + void hvm_dpci_msi_eoi(struct domain *d, int vector) { struct hvm_irq_dpci *hvm_irq_dpci = d->arch.hvm_domain.irq.dpci; - irq_desc_t *desc; int pirq; if ( !iommu_enabled || (hvm_irq_dpci == NULL) ) return; spin_lock(&d->event_lock); + pirq = hvm_irq_dpci->msi_gvec_pirq[vector]; - - if ( ( pirq >= 0 ) && (pirq < NR_IRQS) && - test_bit(pirq, hvm_irq_dpci->mapping) && - (test_bit(_HVM_IRQ_DPCI_MSI, &hvm_irq_dpci->mirq[pirq].flags))) - { - BUG_ON(!local_irq_is_enabled()); - desc = domain_spin_lock_irq_desc(d, pirq, NULL); - if (!desc) - { - spin_unlock(&d->event_lock); - return; - } - - desc->status &= ~IRQ_INPROGRESS; - spin_unlock_irq(&desc->lock); - - pirq_guest_eoi(d, pirq); - } + __msi_pirq_eoi(d, pirq); spin_unlock(&d->event_lock); } @@ -336,14 +364,15 @@ spin_lock(&d->event_lock); #ifdef SUPPORT_MSI_REMAPPING - if ( test_bit(_HVM_IRQ_DPCI_MSI, &hvm_irq_dpci->mirq[irq].flags) ) + if ( hvm_irq_dpci->mirq[irq].flags & HVM_IRQ_DPCI_GUEST_MSI ) { hvm_pci_msi_assert(d, irq); spin_unlock(&d->event_lock); continue; } #endif - stop_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, irq)]); + if ( pt_irq_need_timer(hvm_irq_dpci->mirq[irq].flags) ) + stop_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, irq)]); list_for_each_entry ( digl, &hvm_irq_dpci->mirq[irq].digl_list, list ) { @@ -351,6 +380,14 @@ intx = digl->intx; hvm_pci_intx_assert(d, device, intx); hvm_irq_dpci->mirq[irq].pending++; + +#ifdef SUPPORT_MSI_REMAPPING + if ( hvm_irq_dpci->mirq[irq].flags & HVM_IRQ_DPCI_TRANSLATE ) + { + /* for translated MSI to INTx interrupt, eoi as early as possible */ + __msi_pirq_eoi(d, irq); + } +#endif } /* @@ -360,8 +397,9 @@ * guest will never deal with the irq, then the physical interrupt line * will never be deasserted. */ - set_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, irq)], - NOW() + PT_IRQ_TIME_OUT); + if ( pt_irq_need_timer(hvm_irq_dpci->mirq[irq].flags) ) + set_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, irq)], + NOW() + PT_IRQ_TIME_OUT); spin_unlock(&d->event_lock); } } @@ -405,9 +443,12 @@ * No need to get vector lock for timer * since interrupt is still not EOIed */ - stop_timer(&hvm_irq_dpci->hvm_timer[ - domain_irq_to_vector(d, machine_gsi)]); - pirq_guest_eoi(d, machine_gsi); + if ( pt_irq_need_timer(hvm_irq_dpci->mirq[machine_gsi].flags) ) + { + stop_timer(&hvm_irq_dpci->hvm_timer[ + domain_irq_to_vector(d, machine_gsi)]); + pirq_guest_eoi(d, machine_gsi); + } } } spin_unlock(&d->event_lock); diff -r e2f36d066b7b -r d11b806ed94f xen/include/public/domctl.h --- a/xen/include/public/domctl.h Mon Dec 22 13:48:40 2008 +0000 +++ b/xen/include/public/domctl.h Sun Jan 04 17:41:53 2009 +0800 @@ -466,6 +466,7 @@ PT_IRQ_TYPE_PCI, PT_IRQ_TYPE_ISA, PT_IRQ_TYPE_MSI, + PT_IRQ_TYPE_MSI_TRANSLATE, } pt_irq_type_t; struct xen_domctl_bind_pt_irq { uint32_t machine_irq; diff -r e2f36d066b7b -r d11b806ed94f xen/include/xen/hvm/irq.h --- a/xen/include/xen/hvm/irq.h Mon Dec 22 13:48:40 2008 +0000 +++ b/xen/include/xen/hvm/irq.h Sun Jan 04 17:41:53 2009 +0800 @@ -35,7 +35,16 @@ uint8_t link; }; -#define _HVM_IRQ_DPCI_MSI 0x1 +#define _HVM_IRQ_DPCI_MACH_PCI_SHIFT 0 +#define _HVM_IRQ_DPCI_MACH_MSI_SHIFT 1 +#define _HVM_IRQ_DPCI_GUEST_PCI_SHIFT 4 +#define _HVM_IRQ_DPCI_GUEST_MSI_SHIFT 5 +#define _HVM_IRQ_DPCI_TRANSLATE_SHIFT 15 +#define HVM_IRQ_DPCI_MACH_PCI (1 << _HVM_IRQ_DPCI_MACH_PCI_SHIFT) +#define HVM_IRQ_DPCI_MACH_MSI (1 << _HVM_IRQ_DPCI_MACH_MSI_SHIFT) +#define HVM_IRQ_DPCI_GUEST_PCI (1 << _HVM_IRQ_DPCI_GUEST_PCI_SHIFT) +#define HVM_IRQ_DPCI_GUEST_MSI (1 << _HVM_IRQ_DPCI_GUEST_MSI_SHIFT) +#define HVM_IRQ_DPCI_TRANSLATE (1 << _HVM_IRQ_DPCI_TRANSLATE_SHIFT) struct hvm_gmsi_info { uint32_t gvec; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-08 09:06 UTC
[Xen-devel] [PATCH 2/6] ioemu:passthrough: MSI-INTx interrupt translation support
passthrough: MSI-INTx interrupt translation support This patch enables Xen to use MSI for MSI-capable devices as the underlying interrupt source even if the guest does not explicitly use it. The guest will still see an IO-APIC based INTx interrupt translated by Xen from the MSI irq. If the guest enables MSI or MSI-X for the passthrough device, this translation is automatically turned off. It can also be disabled in the config file at domain startup time. Signed-off-by: Qing He <qing.he@intel.com> --- diff --git a/hw/pass-through.c b/hw/pass-through.c index d280ff6..625e644 100644 --- a/hw/pass-through.c +++ b/hw/pass-through.c @@ -34,6 +34,7 @@ struct php_dev { uint8_t r_bus; uint8_t r_dev; uint8_t r_func; + char *opt; }; struct dpci_infos { @@ -492,7 +493,7 @@ static struct pt_reg_info_tbl pt_emu_reg_msi_tbl[] = { .size = 2, .init_val = 0x0000, .ro_mask = 0x018E, - .emu_mask = 0xFFFE, + .emu_mask = 0xFFFF, .init = pt_msgctrl_reg_init, .u.w.read = pt_word_reg_read, .u.w.write = pt_msgctrl_reg_write, @@ -692,7 +693,7 @@ static int token_value(char *token) return strtol(token, NULL, 16); } -static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func) +static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func, char **opt) { char *token; const char *delim = ":.-"; @@ -711,18 +712,60 @@ static int next_bdf(char **str, int *seg, int *bus, int *dev, int *func) *dev = token_value(token); token = strsep(str, delim); + *opt = strchr(token, '',''); + if (*opt) + *(*opt)++ = ''\0''; + *func = token_value(token); return 1; } +static int get_next_keyval(char **option, char **key, char **val) +{ + char *opt, *k, *v; + + k = *option; + opt = strchr(k, '',''); + if (opt) + *opt++ = ''\0''; + v = strchr(k, ''=''); + if (!v) + return -1; + *v++ = ''\0''; + + *key = k; + *val = v; + *option = opt; + + return 0; +} + +static void msi_set_enable(struct pt_dev *ptdev, int en) +{ + uint16_t val; + uint32_t address; + if (!ptdev->msi) + return; + + address = ptdev->msi->ctrl_offset; + if (!address) + return; + + val = pci_read_word(ptdev->pci_dev, address); + val &= ~PCI_MSI_FLAGS_ENABLE; + val |= en & PCI_MSI_FLAGS_ENABLE; + pci_write_word(ptdev->pci_dev, address, val); +} + /* Insert a new pass-through device into a specific pci slot. * input dom:bus:dev.func@slot, chose free one if slot == 0 * return -1: required slot not available * 0: no free hotplug slots, but normal slot should okay * >0: the new hotplug slot */ -static int __insert_to_pci_slot(int bus, int dev, int func, int slot) +static int __insert_to_pci_slot(int bus, int dev, int func, int slot, + char *opt) { int i, php_slot; @@ -759,6 +802,7 @@ found: dpci_infos.php_devs[php_slot].r_bus = bus; dpci_infos.php_devs[php_slot].r_dev = dev; dpci_infos.php_devs[php_slot].r_func = func; + dpci_infos.php_devs[php_slot].opt = opt; return PHP_TO_PCI_SLOT(php_slot); } @@ -768,19 +812,19 @@ found: int insert_to_pci_slot(char *bdf_slt) { int seg, bus, dev, func, slot; - char *bdf_str, *slt_str; + char *bdf_str, *slt_str, *opt; const char *delim="@"; bdf_str = strsep(&bdf_slt, delim); slt_str = bdf_slt; slot = token_value(slt_str); - if ( !next_bdf(&bdf_str, &seg, &bus, &dev, &func)) + if ( !next_bdf(&bdf_str, &seg, &bus, &dev, &func, &opt)) { return -1; } - return __insert_to_pci_slot(bus, dev, func, slot); + return __insert_to_pci_slot(bus, dev, func, slot, opt); } @@ -807,8 +851,9 @@ int test_pci_slot(int slot) int bdf_to_slot(char *bdf_str) { int seg, bus, dev, func, i; + char *opt; - if ( !next_bdf(&bdf_str, &seg, &bus, &dev, &func)) + if ( !next_bdf(&bdf_str, &seg, &bus, &dev, &func, &opt)) { return -1; } @@ -1960,9 +2005,15 @@ static uint32_t pt_msgctrl_reg_init(struct pt_dev *ptdev, pci_write_word(pdev, real_offset, reg_field & ~PCI_MSI_FLAGS_ENABLE); } ptdev->msi->flags |= (reg_field | MSI_FLAG_UNINIT); + ptdev->msi->ctrl_offset = real_offset; /* All register is 0 after reset, except first 4 byte */ reg_field &= reg->ro_mask; + + if (ptdev->msi_trans_cap) { + PT_LOG("Turning on MSI-INTx translation\n"); + ptdev->msi_trans_en = 1; + } return reg_field; } @@ -2673,6 +2724,34 @@ static int pt_linkctrl2_reg_write(struct pt_dev *ptdev, return 0; } +static void pt_unmap_msi_translate(struct pt_dev *ptdev) +{ + uint16_t e_device, e_intx; + int rc; + + /* MSI_ENABLE bit should be disabed until the new handler is set */ + msi_set_enable(ptdev, 0); + + e_device = (ptdev->dev.devfn >> 3) & 0x1f; + /* fix virtual interrupt pin to INTA# */ + e_intx = 0; + rc = xc_domain_unbind_pt_irq(xc_handle, domid, ptdev->msi->pirq, + PT_IRQ_TYPE_MSI_TRANSLATE, 0, + e_device, e_intx, 0); + if (rc < 0) + PT_LOG("Error: Unbinding pt irq for MSI-INTx failed! rc=%d\n", rc); + + if (ptdev->machine_irq) + { + rc = xc_domain_bind_pt_pci_irq(xc_handle, domid, ptdev->machine_irq, + 0, e_device, e_intx); + if ( rc < 0 ) + PT_LOG("Error: Rebinding of interrupt failed! rc=%d\n", rc); + } + + ptdev->msi_trans_en = 0; +} + /* write Message Control register */ static int pt_msgctrl_reg_write(struct pt_dev *ptdev, struct pt_reg_tbl *cfg_entry, @@ -2682,7 +2761,9 @@ static int pt_msgctrl_reg_write(struct pt_dev *ptdev, uint16_t writable_mask = 0; uint16_t throughable_mask = 0; uint16_t old_ctrl = cfg_entry->data; + uint8_t e_device, e_intx; PCIDevice *pd = (PCIDevice *)ptdev; + uint16_t val; /* Currently no support for multi-vector */ if ((*value & PCI_MSI_FLAGS_QSIZE) != 0x0) @@ -2699,21 +2780,29 @@ static int pt_msgctrl_reg_write(struct pt_dev *ptdev, PT_LOG("old_ctrl:%04xh new_ctrl:%04xh\n", old_ctrl, cfg_entry->data); /* create value for writing to I/O device register */ + val = *value; throughable_mask = ~reg->emu_mask & valid_mask; *value = ((*value & throughable_mask) | (dev_value & ~throughable_mask)); /* update MSI */ - if (*value & PCI_MSI_FLAGS_ENABLE) + if (val & PCI_MSI_FLAGS_ENABLE) { /* setup MSI pirq for the first time */ if (ptdev->msi->flags & MSI_FLAG_UNINIT) { - /* Init physical one */ - PT_LOG("setup msi for dev %x\n", pd->devfn); - if (pt_msi_setup(ptdev)) + if (ptdev->msi_trans_en) { + PT_LOG("guest enabling MSI, disable MSI-INTx translation\n"); + pt_unmap_msi_translate(ptdev); + } + else { - PT_LOG("pt_msi_setup error!!!\n"); - return -1; + /* Init physical one */ + PT_LOG("setup msi for dev %x\n", pd->devfn); + if (pt_msi_setup(ptdev)) + { + PT_LOG("pt_msi_setup error!!!\n"); + return -1; + } } pt_msi_update(ptdev); @@ -2725,6 +2814,12 @@ static int pt_msgctrl_reg_write(struct pt_dev *ptdev, else ptdev->msi->flags &= ~PCI_MSI_FLAGS_ENABLE; + /* pass through MSI_ENABLE bit when no MSI-INTx translation */ + if (!ptdev->msi_trans_en) { + *value &= ~PCI_MSI_FLAGS_ENABLE; + *value |= val & PCI_MSI_FLAGS_ENABLE; + } + return 0; } @@ -2870,7 +2965,13 @@ static int pt_msixctrl_reg_write(struct pt_dev *ptdev, /* update MSI-X */ if ((*value & PCI_MSIX_ENABLE) && !(*value & PCI_MSIX_MASK)) + { + if (ptdev->msi_trans_en) { + PT_LOG("guest enabling MSI-X, disable MSI-INTx translation\n"); + pt_unmap_msi_translate(ptdev); + } pt_msix_update(ptdev); + } ptdev->msix->enabled = !!(*value & PCI_MSIX_ENABLE); @@ -2879,7 +2980,8 @@ static int pt_msixctrl_reg_write(struct pt_dev *ptdev, struct pt_dev * register_real_device(PCIBus *e_bus, const char *e_dev_name, int e_devfn, uint8_t r_bus, uint8_t r_dev, - uint8_t r_func, uint32_t machine_irq, struct pci_access *pci_access) + uint8_t r_func, uint32_t machine_irq, struct pci_access *pci_access, + char *opt) { int rc = -1, i; struct pt_dev *assigned_device = NULL; @@ -2887,6 +2989,8 @@ struct pt_dev * register_real_device(PCIBus *e_bus, uint8_t e_device, e_intx; struct pci_config_cf8 machine_bdf; int free_pci_slot = -1; + char *key, *val; + int msi_translate; PT_LOG("Assigning real physical device %02x:%02x.%x ...\n", r_bus, r_dev, r_func); @@ -2908,13 +3012,41 @@ struct pt_dev * register_real_device(PCIBus *e_bus, if ( e_devfn == PT_VIRT_DEVFN_AUTO ) { /*indicate a static assignment(not hotplug), so find a free PCI hot plug slot */ - free_pci_slot = __insert_to_pci_slot(r_bus, r_dev, r_func, 0); + free_pci_slot = __insert_to_pci_slot(r_bus, r_dev, r_func, 0, NULL); if ( free_pci_slot > 0 ) e_devfn = free_pci_slot << 3; else PT_LOG("Error: no free virtual PCI hot plug slot, thus no live migration.\n"); } + msi_translate = direct_pci_msitranslate; + while (opt) { + if (get_next_keyval(&opt, &key, &val)) { + PT_LOG("Error: unrecognized PCI assignment option \"%s\"\n", opt); + break; + } + + if (strcmp(key, "msitranslate") == 0) + { + if (strcmp(val, "0") == 0 || strcmp(val, "no") == 0) + { + PT_LOG("Disable MSI translation via per device option\n"); + msi_translate = 0; + } + else if (strcmp(val, "1") == 0 || strcmp(val, "yes") == 0) + { + PT_LOG("Enable MSI translation via per device option\n"); + msi_translate = 1; + } + else + PT_LOG("Error: unrecognized value for msitranslate=\n"); + } + else + PT_LOG("Error: unrecognized PCI assignment option \"%s=%s\"\n", key, val); + + } + + /* Register device */ assigned_device = (struct pt_dev *) pci_register_device(e_bus, e_dev_name, sizeof(struct pt_dev), e_devfn, @@ -2929,6 +3061,7 @@ struct pt_dev * register_real_device(PCIBus *e_bus, dpci_infos.php_devs[PCI_TO_PHP_SLOT(free_pci_slot)].pt_dev = assigned_device; assigned_device->pci_dev = pci_dev; + assigned_device->msi_trans_cap = msi_translate; /* Assign device */ machine_bdf.reg = 0; @@ -2960,6 +3093,28 @@ struct pt_dev * register_real_device(PCIBus *e_bus, /* fix virtual interrupt pin to INTA# */ e_intx = 0; + while (assigned_device->msi_trans_en) + { + if (pt_msi_setup(assigned_device)) + { + PT_LOG("Error: MSI-INTx translation MSI setup failed, fallback\n"); + assigned_device->msi_trans_en = 0; + break; + } + + rc = xc_domain_bind_pt_irq(xc_handle, domid, assigned_device->msi->pirq, + PT_IRQ_TYPE_MSI_TRANSLATE, 0, + e_device, e_intx, 0); + if ( rc < 0) + { + PT_LOG("Error: MSI-INTx translation bind failed, fallback\n"); + assigned_device->msi_trans_en = 0; + break; + } + msi_set_enable(assigned_device, 1); + break; + } + if ( PT_MACHINE_IRQ_AUTO == machine_irq ) { int pirq = pci_dev->irq; @@ -2973,9 +3125,15 @@ struct pt_dev * register_real_device(PCIBus *e_bus, PT_LOG("Error: Mapping irq failed, rc = %d\n", rc); } else + { machine_irq = pirq; + assigned_device->machine_irq = pirq; + } } + if (assigned_device->msi_trans_en) + goto out; + /* bind machine_irq to device */ if ( 0 != machine_irq ) { @@ -2995,8 +3153,9 @@ struct pt_dev * register_real_device(PCIBus *e_bus, } out: - PT_LOG("Real physical device %02x:%02x.%x registered successfuly!\n", - r_bus, r_dev, r_func); + PT_LOG("Real physical device %02x:%02x.%x registered successfuly!\n" + "IRQ type = %s\n", r_bus, r_dev, r_func, + assigned_device->msi_trans_en? "MSI-INTx":"INTx"); return assigned_device; } @@ -3029,9 +3188,9 @@ int unregister_real_device(int php_slot) e_device = (assigned_device->dev.devfn >> 3) & 0x1f; /* fix virtual interrupt pin to INTA# */ e_intx = 0; - machine_irq = pci_dev->irq; + machine_irq = assigned_device->machine_irq; - if ( machine_irq != 0 ) { + if ( assigned_device->msi_trans_en == 0 && machine_irq ) { rc = xc_domain_unbind_pt_irq(xc_handle, domid, machine_irq, PT_IRQ_TYPE_PCI, 0, e_device, e_intx, 0); if ( rc < 0 ) @@ -3040,6 +3199,16 @@ int unregister_real_device(int php_slot) PT_LOG("Error: Unbinding of interrupt failed! rc=%d\n", rc); } } + else if (assigned_device->msi_trans_en) + { + rc = xc_domain_unbind_pt_irq(xc_handle, domid, assigned_device->msi->pirq, + PT_IRQ_TYPE_MSI_TRANSLATE, 0, + e_device, e_intx, 0); + if (rc < 0) + PT_LOG("Error: Unbinding pt irq for MSI-INTx failed! rc=%d\n", rc); + } + + /* TODO: unmap passthrough MSI and MSI-X irqs */ /* delete all emulated config registers */ pt_config_delete(assigned_device); @@ -3075,7 +3244,10 @@ int power_on_php_slot(int php_slot) php_dev->r_dev, php_dev->r_func, PT_MACHINE_IRQ_AUTO, - dpci_infos.pci_access); + dpci_infos.pci_access, + php_dev->opt); + + php_dev->opt = NULL; php_dev->pt_dev = pt_dev; @@ -3097,6 +3269,7 @@ int pt_init(PCIBus *e_bus, const char *direct_pci) char slot_str[8]; char *direct_pci_head = NULL; char *direct_pci_p = NULL; + char *opt; /* Initialize libpci */ pci_access = pci_alloc(); @@ -3125,11 +3298,11 @@ int pt_init(PCIBus *e_bus, const char *direct_pci) vslots = qemu_mallocz ( strlen(direct_pci) / 3 ); /* Assign given devices to guest */ - while ( next_bdf(&direct_pci_p, &seg, &b, &d, &f) ) + while ( next_bdf(&direct_pci_p, &seg, &b, &d, &f, &opt) ) { /* Register real device with the emulated bus */ pt_dev = register_real_device(e_bus, "DIRECT PCI", PT_VIRT_DEVFN_AUTO, - b, d, f, PT_MACHINE_IRQ_AUTO, pci_access); + b, d, f, PT_MACHINE_IRQ_AUTO, pci_access, opt); if ( pt_dev == NULL ) { PT_LOG("Error: Registration failed (%02x:%02x.%x)\n", b, d, f); diff --git a/hw/pass-through.h b/hw/pass-through.h index 8aa664b..a7d2727 100644 --- a/hw/pass-through.h +++ b/hw/pass-through.h @@ -121,6 +121,7 @@ struct pt_region { struct pt_msi_info { uint32_t flags; + uint32_t ctrl_offset; /* saved control offset */ int pirq; /* guest pirq corresponding */ uint32_t addr_lo; /* guest message address */ uint32_t addr_hi; /* guest message upper address */ @@ -158,6 +159,10 @@ struct pt_dev { /* emul reg group list */ struct pt_msi_info *msi; /* MSI virtualization */ struct pt_msix_info *msix; /* MSI-X virtualization */ + int machine_irq; /* saved pirq */ + /* Physical MSI to guest INTx translation when possible */ + int msi_trans_cap; + int msi_trans_en; }; /* Used for formatting PCI BDF into cf8 format */ diff --git a/hw/pci.h b/hw/pci.h index 4adc4d7..a527a39 100644 --- a/hw/pci.h +++ b/hw/pci.h @@ -64,6 +64,7 @@ struct PCIDevice { }; extern char direct_pci_str[]; +extern int direct_pci_msitranslate; PCIDevice *pci_register_device(PCIBus *bus, const char *name, int instance_size, int devfn, diff --git a/xenstore.c b/xenstore.c index 86e8b63..ff3d023 100644 --- a/xenstore.c +++ b/xenstore.c @@ -290,8 +290,10 @@ const char *xenstore_get_guest_uuid(void) { #endif } -#define DIRECT_PCI_STR_LEN 160 +#define DIRECT_PCI_STR_LEN 512 +#define PT_PCI_MSITRANSLATE_DEFAULT 1 char direct_pci_str[DIRECT_PCI_STR_LEN]; +int direct_pci_msitranslate; void xenstore_parse_domain_config(int hvm_domid) { char **e_danger = NULL; @@ -556,20 +558,50 @@ void xenstore_parse_domain_config(int hvm_domid) free(dev); dev = xs_read(xsh, XBT_NULL, buf, &len); - if ( strlen(dev) + strlen(direct_pci_str) > DIRECT_PCI_STR_LEN ) { + if ( strlen(dev) + strlen(direct_pci_str) > DIRECT_PCI_STR_LEN - 1) { fprintf(stderr, "qemu: too many pci pass-through devices\n"); memset(direct_pci_str, 0, DIRECT_PCI_STR_LEN); goto out; } + /* append to direct_pci_str */ + if ( !dev ) + continue; + + strcat(direct_pci_str, dev); + + if (pasprintf(&buf, "/local/domain/0/backend/pci/%u/%u/opts-%d", + hvm_domid, pci_devid, i) != -1) { + free(dev); + dev = xs_read(xsh, XBT_NULL, buf, &len); + } if ( dev ) { + if ( strlen(dev) + strlen(direct_pci_str) > DIRECT_PCI_STR_LEN - 2) { + fprintf(stderr, "qemu: too many pci pass-through devices\n"); + memset(direct_pci_str, 0, DIRECT_PCI_STR_LEN); + goto out; + } + strcat(direct_pci_str, ","); strcat(direct_pci_str, dev); - strcat(direct_pci_str, "-"); } + + strcat(direct_pci_str, "-"); } } + /* get the pci pass-through parameter */ + if (pasprintf(&buf, "/local/domain/0/backend/pci/%u/%u/msitranslate", + hvm_domid, pci_devid) == -1) + goto out; + + free(params); + params = xs_read(xsh, XBT_NULL, buf, &len); + if (params) + direct_pci_msitranslate = atoi(params); + else + direct_pci_msitranslate = PT_PCI_MSITRANSLATE_DEFAULT; + out: free(danger_type); free(params); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-08 09:06 UTC
[Xen-devel] [PATCH 3/6] pci: add pci option support for XML-RPC server
pci: add pci option support for XML-RPC server Allow the per-device options for passthrough pci devices for exmaple, in domain config file: pci = [''01:00.0,opt1=val1,opt2=val2'', ''01:00.1'' ] or in the PCI hotplug case: xm pci-attach -o opt1=val1 --options=opt2=val2 <domid> 01:00.0 6 This patch is for xml-rpc server Signed-off-by: Qing He <qing.he@intel.com> --- diff -r d11b806ed94f -r 02ebed7b45d0 tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Sun Jan 04 17:41:53 2009 +0800 +++ b/tools/python/xen/xend/XendDomainInfo.py Thu Jan 08 01:57:58 2009 +0800 @@ -645,10 +645,17 @@ " already been assigned to other domain, or maybe" " it doesn''t exist." % (bus, dev, func)) - bdf_str = "%s:%s:%s.%s@%s" % (new_dev[''domain''], + opts = '''' + if ''opts'' in new_dev and len(new_dev[''opts'']) > 0: + config_opts = new_dev[''opts''] + config_opts = map(lambda (x, y): x+''=''+y, config_opts) + opts = '','' + reduce(lambda x, y: x+'',''+y, config_opts) + + bdf_str = "%s:%s:%s.%s%s@%s" % (new_dev[''domain''], new_dev[''bus''], new_dev[''slot''], new_dev[''func''], + opts, new_dev[''vslt'']) self.image.signalDeviceModel(''pci-ins'', ''pci-inserted'', bdf_str) @@ -2154,7 +2161,11 @@ xc.domain_max_vcpus(self.domid, int(self.info[''VCPUs_max''])) # Test whether the devices can be assigned with VT-d - pci_str = str(self.info["platform"].get("pci")) + pci = self.info["platform"].get("pci") + pci_str = '''' + if pci and len(pci) > 0: + pci = map(lambda x: x[0:4], pci) # strip options + pci_str = str(pci) if hvm and pci_str: bdf = xc.test_assign_device(self.domid, pci_str) if bdf != 0: diff -r d11b806ed94f -r 02ebed7b45d0 tools/python/xen/xend/server/pciif.py --- a/tools/python/xen/xend/server/pciif.py Sun Jan 04 17:41:53 2009 +0800 +++ b/tools/python/xen/xend/server/pciif.py Thu Jan 08 01:57:58 2009 +0800 @@ -75,6 +75,12 @@ slot = parse_hex(pci_config.get(''slot'', 0)) func = parse_hex(pci_config.get(''func'', 0)) + opts = pci_config.get(''opts'', '''') + if len(opts) > 0: + opts = map(lambda (x, y): x+''=''+y, opts) + opts = reduce(lambda x, y: x+'',''+y, opts) + back[''opts-%i'' % pcidevid] = opts + vslt = pci_config.get(''vslt'') if vslt is not None: vslots = vslots + vslt + ";" @@ -108,6 +114,9 @@ dev = back[''dev-%i'' % i] state = states[i] uuid = back[''uuid-%i'' %i] + opts = '''' + if ''opts-%i'' % i in back: + opts = back[''opts-%i'' % i] except: raise XendError(''Error reading config'') @@ -129,6 +138,8 @@ self.writeBackend(devid, ''state-%i'' % (num_olddevs + i), str(xenbusState[''Initialising''])) self.writeBackend(devid, ''uuid-%i'' % (num_olddevs + i), uuid) + if len(opts) > 0: + self.writeBackend(devid, ''opts-%i'' % (num_olddevs + i), opts) self.writeBackend(devid, ''num_devs'', str(num_olddevs + i + 1)) # Update vslots @@ -540,6 +551,9 @@ self.removeBackend(devid, ''vdev-%i'' % i) self.removeBackend(devid, ''state-%i'' % i) self.removeBackend(devid, ''uuid-%i'' % i) + tmpopts = self.readBackend(devid, ''opts-%i'' % i) + if tmpopts is not None: + self.removeBackend(devid, ''opts-%i'' % i) else: if new_num_devs != i: tmpdev = self.readBackend(devid, ''dev-%i'' % i) @@ -556,6 +570,9 @@ tmpuuid = self.readBackend(devid, ''uuid-%i'' % i) self.writeBackend(devid, ''uuid-%i'' % new_num_devs, tmpuuid) self.removeBackend(devid, ''uuid-%i'' % i) + tmpopts = self.readBackend(devid, ''opts-%i'' % i) + if tmpopts is not None: + self.removeBackend(devid, ''opts-%i'' % i) new_num_devs = new_num_devs + 1 self.writeBackend(devid, ''num_devs'', str(new_num_devs)) diff -r d11b806ed94f -r 02ebed7b45d0 tools/python/xen/xm/create.py --- a/tools/python/xen/xm/create.py Sun Jan 04 17:41:53 2009 +0800 +++ b/tools/python/xen/xm/create.py Thu Jan 08 01:57:58 2009 +0800 @@ -667,9 +667,20 @@ """Create the config for pci devices. """ config_pci = [] - for (domain, bus, slot, func) in vals.pci: - config_pci.append([''dev'', [''domain'', domain], [''bus'', bus], \ - [''slot'', slot], [''func'', func]]) + for (domain, bus, slot, func, opts) in vals.pci: + config_pci_opts = [] + d = comma_sep_kv_to_dict(opts) + + def f(k): + config_pci_opts.append([k, d[k]]) + + config_pci_bdf = [''dev'', [''domain'', domain], [''bus'', bus], \ + [''slot'', slot], [''func'', func]] + map(f, d.keys()) + if len(config_pci_opts)>0: + config_pci_bdf.append([''opts'', config_pci_opts]) + + config_pci.append(config_pci_bdf) if len(config_pci)>0: config_pci.insert(0, ''pci'') @@ -991,14 +1002,18 @@ pci_match = re.match(r"((?P<domain>[0-9a-fA-F]{1,4})[:,])?" + \ r"(?P<bus>[0-9a-fA-F]{1,2})[:,]" + \ r"(?P<slot>[0-9a-fA-F]{1,2})[.,]" + \ - r"(?P<func>[0-7])$", pci_dev_str) + r"(?P<func>[0-7])" + \ + r"(,(?P<opts>.*))?$", pci_dev_str) if pci_match!=None: - pci_dev_info = pci_match.groupdict(''0'') + pci_dev_info = pci_match.groupdict('''') + if pci_dev_info[''domain'']=='''': + pci_dev_info[''domain'']=''0'' try: pci.append( (''0x''+pci_dev_info[''domain''], \ ''0x''+pci_dev_info[''bus''], \ ''0x''+pci_dev_info[''slot''], \ - ''0x''+pci_dev_info[''func''])) + ''0x''+pci_dev_info[''func''], \ + pci_dev_info[''opts''])) except IndexError: err(''Error in PCI slot syntax "%s"''%(pci_dev_str)) vals.pci = pci diff -r d11b806ed94f -r 02ebed7b45d0 tools/python/xen/xm/main.py --- a/tools/python/xen/xm/main.py Sun Jan 04 17:41:53 2009 +0800 +++ b/tools/python/xen/xm/main.py Thu Jan 08 01:57:58 2009 +0800 @@ -187,7 +187,7 @@ ''vnet-delete'' : (''<VnetId>'', ''Delete a Vnet.''), ''vnet-list'' : (''[-l|--long]'', ''List Vnets.''), ''vtpm-list'' : (''<Domain> [--long]'', ''List virtual TPM devices.''), - ''pci-attach'' : (''<Domain> <domain:bus:slot.func> [virtual slot]'', + ''pci-attach'' : (''[-o|--options=<opt>] <Domain> <domain:bus:slot.func> [virtual slot]'', ''Insert a new pass-through pci device.''), ''pci-detach'' : (''<Domain> <domain:bus:slot.func>'', ''Remove a domain\''s pass-through pci device.''), @@ -2428,7 +2428,7 @@ vif.append(vif_param) server.xend.domain.device_create(dom, vif) -def parse_pci_configuration(args, state): +def parse_pci_configuration(args, state, opts = ''''): dom = args[0] pci_dev_str = args[1] if len(args) == 3: @@ -2443,12 +2443,17 @@ if pci_match == None: raise OptionError("Invalid argument: %s %s" % (pci_dev_str,vslt)) pci_dev_info = pci_match.groupdict(''0'') + try: - pci.append([''dev'', [''domain'', ''0x''+ pci_dev_info[''domain'']], \ + pci_bdf =[''dev'', [''domain'', ''0x''+ pci_dev_info[''domain'']], \ [''bus'', ''0x''+ pci_dev_info[''bus'']], [''slot'', ''0x''+ pci_dev_info[''slot'']], [''func'', ''0x''+ pci_dev_info[''func'']], - [''vslt'', ''0x%x'' % int(vslt, 16)]]) + [''vslt'', ''0x%x'' % int(vslt, 16)]] + if len(opts) > 0: + pci_bdf.append([''opts'', opts]) + pci.append(pci_bdf) + except: raise OptionError("Invalid argument: %s %s" % (pci_dev_str,vslt)) pci.append([''state'', state]) @@ -2456,8 +2461,22 @@ return (dom, pci) def xm_pci_attach(args): - arg_check(args, ''pci-attach'', 2, 3) - (dom, pci) = parse_pci_configuration(args, ''Initialising'') + config_pci_opts = [] + (options, params) = getopt.gnu_getopt(args, ''o:'', [''options='']) + for (k, v) in options: + if k in (''-o'', ''--options''): + if len(v.split(''='')) != 2: + err("Invalid pci attach option: %s" % v) + usage(''pci-attach'') + config_pci_opts.append(v.split(''='')) + + n = len([i for i in params if i != ''--'']) + if n < 2 or n > 3: + err("Invalid argument for ''xm pci-attach''") + usage(''pci-attach'') + + (dom, pci) = parse_pci_configuration(params, ''Initialising'', + config_pci_opts) if serverType == SERVER_XEN_API: _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-08 09:06 UTC
[Xen-devel] [PATCH 4/6] pci: add pci option support for XenAPI server
pci: add pci option support for XenAPI sever Allow the per-device options for passthrough pci devices This patch is for XenAPI server A new key-value pair element of ''pci'' named ''pci_opt'' is added to xml config file Signed-off-by: Qing He <qing.he@intel.com> --- diff -r 02ebed7b45d0 -r 82b05ac013fc tools/python/xen/xend/XendConfig.py --- a/tools/python/xen/xend/XendConfig.py Thu Jan 08 01:57:58 2009 +0800 +++ b/tools/python/xen/xend/XendConfig.py Thu Jan 08 02:09:19 2009 +0800 @@ -1247,6 +1247,11 @@ ''PPCI'': ppci_uuid, ''hotplug_slot'': pci_dev.get(''vslot'', 0) } + + dpci_opts = pci_dev.get(''opts'') + if dpci_opts and len(dpci_opts) > 0: + dpci_record[''options''] = dpci_opts + XendDPCI(dpci_uuid, dpci_record) target[''devices''][pci_devs_uuid] = (dev_type, @@ -1762,6 +1767,11 @@ ''PPCI'': ppci_uuid, ''hotplug_slot'': pci_dev.get(''vslot'', 0) } + + dpci_opts = pci_dev.get(''opts'') + if dpci_opts and len(dpci_opts) > 0: + dpci_record[''options''] = dpci_opts + XendDPCI(dpci_uuid, dpci_record) self[''devices''][dev_uuid] = (dev_type, diff -r 02ebed7b45d0 -r 82b05ac013fc tools/python/xen/xend/XendDPCI.py --- a/tools/python/xen/xend/XendDPCI.py Thu Jan 08 01:57:58 2009 +0800 +++ b/tools/python/xen/xend/XendDPCI.py Thu Jan 08 02:09:19 2009 +0800 @@ -41,7 +41,8 @@ ''virtual_name'', ''VM'', ''PPCI'', - ''hotplug_slot''] + ''hotplug_slot'', + ''options''] return XendBase.getAttrRO() + attrRO def getAttrRW(self): @@ -119,6 +120,8 @@ self.VM = record[''VM''] self.PPCI = record[''PPCI''] self.hotplug_slot = record[''hotplug_slot''] + if ''options'' in record.keys(): + self.options = record[''options''] def destroy(self): xendom = XendDomain.instance() @@ -152,3 +155,5 @@ def get_hotplug_slot(self): return self.hotplug_slot + def get_options(self): + return self.options diff -r 02ebed7b45d0 -r 82b05ac013fc tools/python/xen/xend/XendDomainInfo.py --- a/tools/python/xen/xend/XendDomainInfo.py Thu Jan 08 01:57:58 2009 +0800 +++ b/tools/python/xen/xend/XendDomainInfo.py Thu Jan 08 02:09:19 2009 +0800 @@ -3458,6 +3458,11 @@ dpci_uuid = uuid.createString() + dpci_opts = [] + opts_dict = xenapi_pci.get(''options'') + for k in opts_dict.keys(): + dpci_opts.append([k, opts_dict[k]]) + # Convert xenapi to sxp ppci = XendAPIStore.get(xenapi_pci.get(''PPCI''), ''PPCI'') @@ -3469,6 +3474,7 @@ [''slot'', ''0x%02x'' % ppci.get_slot()], [''func'', ''0x%1x'' % ppci.get_func()], [''vslt'', ''0x%02x'' % xenapi_pci.get(''hotplug_slot'')], + [''opts'', dpci_opts], [''uuid'', dpci_uuid] ], [''state'', ''Initialising''] diff -r 02ebed7b45d0 -r 82b05ac013fc tools/python/xen/xm/main.py --- a/tools/python/xen/xm/main.py Thu Jan 08 01:57:58 2009 +0800 +++ b/tools/python/xen/xm/main.py Thu Jan 08 02:09:19 2009 +0800 @@ -2499,7 +2499,8 @@ dpci_record = { "VM": get_single_vm(dom), "PPCI": target_ref, - "hotplug_slot": vslt + "hotplug_slot": vslt, + "options": dict([k, v] for k, v in config_pci_opts) } server.xenapi.DPCI.create(dpci_record) diff -r 02ebed7b45d0 -r 82b05ac013fc tools/python/xen/xm/xenapi_create.py --- a/tools/python/xen/xm/xenapi_create.py Thu Jan 08 01:57:58 2009 +0800 +++ b/tools/python/xen/xm/xenapi_create.py Thu Jan 08 02:09:19 2009 +0800 @@ -533,7 +533,10 @@ "PPCI": target_ref, "hotplug_slot": - int(pci.attributes["func"].value, 16) + int(pci.attributes["func"].value, 16), + "options": + get_child_nodes_as_dict(pci, + "pci_opt", "key", "value") } return server.xenapi.DPCI.create(dpci_record) @@ -931,6 +934,12 @@ = get_child_by_name(dev_sxp, "func", "0") pci.attributes["vslt"] \ = get_child_by_name(dev_sxp, "vslt", "0") + for opt in get_child_by_name(dev_sxp, "opts", ""): + if len(opt) > 0: + pci_opt = document.createElement("pci_opt") + pci_opt.attributes["key"] = opt[0] + pci_opt.attributes["value"] = opt[1] + pci.appendChild(pci_opt) pcis.append(pci) diff -r b0a4862c9018 -r 6e84f349112d tools/python/xen/xm/create.dtd --- a/tools/python/xen/xm/create.dtd Thu Jan 08 02:13:31 2009 +0800 +++ b/tools/python/xen/xm/create.dtd Thu Jan 08 02:15:01 2009 +0800 @@ -82,11 +82,12 @@ <!ELEMENT vtpm (name*)> <!ATTLIST vtpm backend CDATA #REQUIRED> -<!ELEMENT pci EMPTY> +<!ELEMENT pci (pci_opt*)> <!ATTLIST pci domain CDATA #REQUIRED bus CDATA #REQUIRED slot CDATA #REQUIRED func CDATA #REQUIRED + opts_str CDATA #IMPLIED vslt CDATA #IMPLIED> <!ELEMENT vscsi EMPTY> @@ -138,6 +139,10 @@ <!ATTLIST vcpu_param key CDATA #REQUIRED value CDATA #REQUIRED> +<!ELEMENT pci_opt EMPTY> +<!ATTLIST pci_opt key CDATA #REQUIRED + value CDATA #REQUIRED> + <!ELEMENT other_config EMPTY> <!ATTLIST other_config key CDATA #REQUIRED value CDATA #REQUIRED> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-08 09:06 UTC
[Xen-devel] [PATCH 5/6] pci: add config options for MSI-INTx translation in HVM
pci: add config options for MSI-INTx translation in HVM Add a config file option ''pci_msitranslate'' to enable MSI-INTx translation in HVM, and also a per-device option ''msitranslate'' to allow device based overriden Signed-off-by: Qing He <qing.he@intel.com> --- diff -r 82b05ac013fc -r b0a4862c9018 tools/python/xen/xend/XendConfig.py --- a/tools/python/xen/xend/XendConfig.py Thu Jan 08 02:09:19 2009 +0800 +++ b/tools/python/xen/xend/XendConfig.py Thu Jan 08 02:13:31 2009 +0800 @@ -166,6 +166,7 @@ ''guest_os_type'': str, ''hap'': int, ''xen_extended_power_mgmt'': int, + ''pci_msitranslate'': int, } # Xen API console ''other_config'' keys. diff -r 82b05ac013fc -r b0a4862c9018 tools/python/xen/xend/server/pciif.py --- a/tools/python/xen/xend/server/pciif.py Thu Jan 08 02:09:19 2009 +0800 +++ b/tools/python/xen/xend/server/pciif.py Thu Jan 08 02:13:31 2009 +0800 @@ -95,6 +95,9 @@ back[''num_devs'']=str(pcidevid) back[''uuid''] = config.get(''uuid'','''') + if ''pci_msitranslate'' in self.vm.info[''platform'']: + back[''msitranslate'']=str(self.vm.info[''platform''][''pci_msitranslate'']) + return (0, back, {}) diff -r 82b05ac013fc -r b0a4862c9018 tools/python/xen/xm/create.py --- a/tools/python/xen/xm/create.py Thu Jan 08 02:09:19 2009 +0800 +++ b/tools/python/xen/xm/create.py Thu Jan 08 02:13:31 2009 +0800 @@ -318,11 +318,14 @@ backend driver domain to use for the disk. The option may be repeated to add more than one disk.""") -gopts.var(''pci'', val=''BUS:DEV.FUNC'', +gopts.var(''pci'', val=''BUS:DEV.FUNC[,msitranslate=0|1]'', fn=append_value, default=[], use="""Add a PCI device to a domain, using given params (in hex). - For example ''pci=c0:02.1''. - The option may be repeated to add more than one pci device.""") + For example ''pci=c0:02.1''. + If msitranslate is set, MSI-INTx translation is enabled if possible. + Guest that doesn''t support MSI will get IO-APIC type IRQs + translated from physical MSI, HVM only. Default is 1. + The option may be repeated to add more than one pci device.""") gopts.var(''vscsi'', val=''PDEV,VDEV[,DOM]'', fn=append_value, default=[], @@ -588,6 +591,11 @@ fn=set_bool, default=None, use="""Do not inject spurious page faults into this guest""") +gopts.var(''pci_msitranslate'', val=''TRANSLATE'', + fn=set_int, default=1, + use="""Global PCI MSI-INTx translation flag (0=disable; + 1=enable.""") + def err(msg): """Print an error to stderr and exit. """ @@ -672,6 +680,9 @@ d = comma_sep_kv_to_dict(opts) def f(k): + if k not in [''msitranslate'']: + err(''Invalid pci option: '' + k) + config_pci_opts.append([k, d[k]]) config_pci_bdf = [''dev'', [''domain'', domain], [''bus'', bus], \ @@ -878,7 +889,7 @@ ''sdl'', ''display'', ''xauthority'', ''rtc_timeoffset'', ''monitor'', ''acpi'', ''apic'', ''usb'', ''usbdevice'', ''keymap'', ''pci'', ''hpet'', ''guest_os_type'', ''hap'', ''opengl'', ''cpuid'', ''cpuid_check'', - ''viridian'', ''xen_extended_power_mgmt'' ] + ''viridian'', ''xen_extended_power_mgmt'', ''pci_msitranslate'' ] for a in args: if a in vals.__dict__ and vals.__dict__[a] is not None: diff -r 82b05ac013fc -r b0a4862c9018 tools/python/xen/xm/xenapi_create.py --- a/tools/python/xen/xm/xenapi_create.py Thu Jan 08 02:09:19 2009 +0800 +++ b/tools/python/xen/xm/xenapi_create.py Thu Jan 08 02:13:31 2009 +0800 @@ -1041,6 +1041,7 @@ ''vhpt'', ''guest_os_type'', ''hap'', + ''pci_msitranslate'', ] platform_configs = [] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-08 09:06 UTC
[Xen-devel] [PATCH 6/6] passthough: MSI-INTx translation documentation
passthough: MSI-INTx translation documentation Signed-off-by: Qing He <qing.he@intel.com> --- diff -r c4ef5731a992 -r 80fcb0b96801 docs/misc/vtd.txt --- a/docs/misc/vtd.txt Wed Jan 07 07:39:55 2009 +0800 +++ b/docs/misc/vtd.txt Wed Jan 07 20:35:11 2009 +0800 @@ -38,6 +38,30 @@ Add "msi=1" option in kernel line of host grub. +MSI-INTx translation for passthrough devices in HVM +--------------------------------------------------- + +If the assigned device uses a physical IRQ that is shared by more than +one device among multiple domains, there may be significant impact on +device performance. Unfortunately, this is quite a common case if the +IO-APIC (INTx) IRQ is used. MSI can avoid this issue, but was only +available if the guest enables it. + +With MSI-INTx translation turned on, Xen enables device MSI if it''s +available, regardless of whether the guest uses INTx or MSI. If the +guest uses INTx IRQ, Xen will inject a translated INTx IRQ to guest''s +virtual ioapic whenever an MSI message is received. This reduces the +interrupt sharing of the system. If the guest OS enables MSI or MSI-X, +the translation is automatically turned off. + +To enable or disable MSI-INTx translation globally, add "pci_msitranslate" +in the config file: + pci_msitranslate = 1 (default is 1) + +To override for a specific device: + pci = [ ''01:00.0,msitranslate=0'', ''03:00.0'' ] + + Caveat on Conventional PCI Device Passthrough --------------------------------------------- @@ -79,6 +103,11 @@ 3. Attach a PCI device to the guest by the physical BDF and desired virtual slot(optional). Following command would insert the physical device into guest''s virtual slot 7 [root@vt-vtd ~]# xm pci-attach HVMDomainVtd 0:2:0.0 7 + + To specify options for the device, use -o or --options=. Following command would disable MSI-INTx translation for the device + + [root@vt-vtd ~]# xm pci-attach -o msitranslate=0 0:2:0.0 7 + VTd hotplug usage model: ------------------------ diff -r c4ef5731a992 -r 80fcb0b96801 tools/examples/xmexample.hvm --- a/tools/examples/xmexample.hvm Wed Jan 07 07:39:55 2009 +0800 +++ b/tools/examples/xmexample.hvm Wed Jan 07 20:35:11 2009 +0800 @@ -288,6 +288,39 @@ # ''x'' -> we don''t care (do not check) # ''s'' -> the bit must be the same as on the host that started this VM +#----------------------------------------------------------------------------- +# Configure passthrough PCI{,-X,e} devices: +# +# pci=[ ''[SSSS:]BB:DD.F[,option1[,option2[...]]]'', ... ] +# +# [SSSS]:BB:DD.F "bus segment:bus:device.function"(1) of the device to +# be assigned, bus segment is optional. All fields are +# in hexadecimal and no field should be longer than that +# as shown in the pattern. Successful assignment may need +# certain hardware support and additional configurations +# (e.g. VT-d, see docs/misc/vtd.txt for more details). +# +# (1) bus segment is sometimes also referred to as the PCI "domain", +# not to be confused with Xen domain. +# +# +# optionN per-device options in "key=val" format. Current +# available options are: +# - msitranslate=0|1 +# per-device overriden of pci_msitranslate, see below +# +#pci=[ ''07:00.0'', ''07:00.1'' ] + +# MSI-INTx translation for MSI capable devices: +# +# If it''s set, Xen will enable MSI for the device that supports it even +# if the guest don''t use MSI. In the case, an IO-APIC type interrupt will +# be injected to the guest every time a corresponding MSI message is +# received. +# If the guest enables MSI or MSI-X, the translation is automatically +# turned off. +# +#pci_msitranslate=1 #----------------------------------------------------------------------------- # Configure PVSCSI devices: _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-08 10:44 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
I have one question. MSI interrupt is edge-triggered, and INTx interrupt is level-triggered. Guest OS handles translated interrupt as level-triggered, though physical interrupt is edge-triggered. When two interrupts are raised during short period, Guest OS might lose 2nd interrupt, I think. What do you think? Thanks, -- Shohei Fujiwara On Thu, 8 Jan 2009 17:06:43 +0800 Qing He <qing.he@intel.com> wrote:> This patchset enables MSI-INTx interrupt translation for HVM > domains. The intention of the patch is to use MSI as the physical > interrupt mechanism for passthrough device as much as possible, > thus reducing the pirq sharing among domains. > > When MSI is globally enabled, if the device has the MSI capability > but doesn''t used by the guest, hypervisor will try to user MSI as > the underlying pirq and inject translated INTx irq to guest > vioapic. When guest itself enabled MSI or MSI-X, the translation > is automatically turned off. > > Add a config file option to disable/enable this feature. Also, in > order to allow the user to override the option per device, a > per-device option mechanism is implemented and an MSI-INTx option > is added > > The patch set contains 6 patches and is organized as below: > [PATCH 1/6] hypervisor patch > *[PATCH 2/6] qemu patch > [PATCH 3/6] add pci options in domain config file > [PATCH 4/6] XenAPI version of the above patch > [PATCH 5/6] add global and per-device option for MSI-INTx > [PATCH 6/6] documentation and example > > * against ioemu-remote > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-08 14:52 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Thu, 2009-01-08 at 18:44 +0800, Shohei Fujiwara wrote:> I have one question. > > MSI interrupt is edge-triggered, and INTx interrupt is level-triggered. > Guest OS handles translated interrupt as level-triggered, though physical > interrupt is edge-triggered. When two interrupts are raised during short > period, Guest OS might lose 2nd interrupt, I think.This problem is handled by a different EOI timing. As soon as the hypervisor receives an MSI, it issues the EOI ASAP, and the duration of the injected level-triggered IRQ and guest EOI are all handled by the virtual APICs. If a 2nd interrupt comes up at this time, the hypervisor can receive the EOI and this results in a pending IRQ in virtual APICs instead of lost. Generally, it''s easy to "translate" an edged interrupt to a level one, but not the other way. Thanks, Qing> > What do you think? > > Thanks, > -- > Shohei Fujiwara >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-09 04:26 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
Thank you for your reply. On Thu, 8 Jan 2009 22:52:00 +0800 Qing He <qing.he@intel.com> wrote:> On Thu, 2009-01-08 at 18:44 +0800, Shohei Fujiwara wrote: > > I have one question. > > > > MSI interrupt is edge-triggered, and INTx interrupt is level-triggered. > > Guest OS handles translated interrupt as level-triggered, though physical > > interrupt is edge-triggered. When two interrupts are raised during short > > period, Guest OS might lose 2nd interrupt, I think. > > This problem is handled by a different EOI timing. As soon as the > hypervisor receives an MSI, it issues the EOI ASAP, and the duration of > the injected level-triggered IRQ and guest EOI are all handled by the > virtual APICs. If a 2nd interrupt comes up at this time, the hypervisor > can receive the EOI and this results in a pending IRQ in virtual APICs > instead of lost.There is the assumption that Guest OS handles all causes which happen before Guest OS receives the interrupt. But is the assumption right for all OS? In the case of level-triggerd interrupt, I/O device asserts interrupt line, when the cause of interrupt happens. OS handles the cause, I/O device de-asserts interrupt, and OS sends EOI to APICs. When I/O APIC receives EOI, I/O APIC re-transmits interrupt to Local APIC if some interrupt line is asserted. Some OS might rely on this re-transmittion by I/O APIC. This is the part of "handle_IRQ_event" in linux 2.6.18: do { ret = action->handler(irq, action->dev_id, regs); if (ret == IRQ_HANDLED) status |= action->flags; retval |= ret; action = action->next; } while (action); The assumption is right for linux. But other OS might have the code like the following: do { ret = action->handler(irq, action->dev_id, regs); if (ret == IRQ_HANDLED) { status |= action->flags; retval |= ret; break; ^^^^^^ } action = action->next; } while (action); This code will work on real machine, because I/O APIC re-transmits interrupt, if the cause to be handled remains. If some OS has the code like the above, the assumption isn''t right. Actually, my concern is whether the assumption is right for Windows, or not. Do you know about this, or does your patch works well with Windows guest? Thanks, -- Shohei Fujiwara> Generally, it''s easy to "translate" an edged interrupt to a level one, > but not the other way. > > Thanks, > Qing > > > > What do you think? > > > > Thanks, > > -- > > Shohei Fujiwara > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-09 06:57 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Fri, 2009-01-09 at 12:26 +0800, Shohei Fujiwara wrote:> There is the assumption that Guest OS handles all causes which happen > before Guest OS receives the interrupt. But is the assumption right for > all OS? > > In the case of level-triggerd interrupt, I/O device asserts interrupt > line, when the cause of interrupt happens. OS handles the cause, > I/O device de-asserts interrupt, and OS sends EOI to APICs. > > When I/O APIC receives EOI, I/O APIC re-transmits interrupt to Local APIC > if some interrupt line is asserted. > > Some OS might rely on this re-transmittion by I/O APIC. >> > But other OS might have the code like the following: > > do { > ret = action->handler(irq, action->dev_id, regs); > if (ret == IRQ_HANDLED) { > status |= action->flags; > retval |= ret; > break; > ^^^^^^ > } > action = action->next; > } while (action); >Hmm, I think now I understand what you mean. If the guest irq is shared by a normal IRQ and a MSI-INTx translated IRQ, two sources may assert the pin while they both get pending. When this irq is injected, if the guest only handles one irq source each time, and issues EOI right after it clears the normal IRQ, the MSI is lost. Is it what you mean? There is logic to avoid this from happening, see hvm_irq->gsi_assert_count[gsi]. Basically, it''s used to count how many sources have asserted a shared pin. And at the time of EOI, after the decrement of the counter, if it''s still not 0, the LAPIC is re-asserted. This may result in some spurious interrupts to guest, but that''s better than losing interrupts. The sharing of guest irq is generally not a good idea, in fact, this is not even well supported in current Xen code. You may have seen something like "girq[ggsi].mirq = mirq". That way, we are already stuck. Currently, if the number of assigned devices is <= 8, there should be no sharing, otherwise, very weird things may happen... Uncommon devices of OS does have the possibility to fail MSI-INTx, for example, if the device doesn''t behave the same way using INTx and MSI, or the guest OS doesn''t always clear guest source before issuing EOI. That''s why I add the per-device disable function: if a device or OS doesn''t work properly, just turn it off. Fortunately, this is extremely rare. Btw, AFAIK, Windows handles all irq sources in one ISR, similar to Linux. Thanks, Qing> > This code will work on real machine, because I/O APIC re-transmits > interrupt, if the cause to be handled remains. If some OS has the > code like the above, the assumption isn''t right. > > Actually, my concern is whether the assumption is right for Windows, > or not. Do you know about this, or does your patch works well with > Windows guest? > > Thanks, > -- > Shohei Fujiwara > > > Generally, it''s easy to "translate" an edged interrupt to a level one, > > but not the other way. > > > > Thanks, > > Qing > > > > > > What do you think? > > > > > > Thanks, > > > -- > > > Shohei Fujiwara > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-13 09:05 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
What I understand are the followings. - When guest gsi isn''t shared, MSI-INTx interrupt translation works fine. - When guest gsi is shared between passthrough device and emulated device, MSI-INTx interrupt translation work, though guest OS receives spurious interrupts. - Sharing guest gsi among passthrough devices isn''t supported. - There are some unsuitable devices for MSI-INTx interrupt translation. May I ask you any additional questions? First, why can''t we assign more than 8 devices? In the view of guest OS, assigned device is always single function device. This means assigned devices use only INTA. And Interrupt routing in hypervisor is shown as follows.>From xen/include/asm-x86/hvm/irq.c:#define hvm_pci_intx_gsi(dev, intx) \ (((((dev)<<2) + ((dev)>>3) + (intx)) & 31) + 16) I think sharing guest gsi among passthrough devices doesn''t occur if assigned device is <= 32. Second, it is nice to create blacklist of unsuitable devices for MSI-INTx interrupt, isn''t it? The reason is the problem seems device-specific problem. Thanks, -- Shohei Fujiwara On Fri, 9 Jan 2009 14:57:16 +0800 Qing He <qing.he@intel.com> wrote:> On Fri, 2009-01-09 at 12:26 +0800, Shohei Fujiwara wrote: > > There is the assumption that Guest OS handles all causes which happen > > before Guest OS receives the interrupt. But is the assumption right for > > all OS? > > > > In the case of level-triggerd interrupt, I/O device asserts interrupt > > line, when the cause of interrupt happens. OS handles the cause, > > I/O device de-asserts interrupt, and OS sends EOI to APICs. > > > > When I/O APIC receives EOI, I/O APIC re-transmits interrupt to Local APIC > > if some interrupt line is asserted. > > > > Some OS might rely on this re-transmittion by I/O APIC. > > > > > > > But other OS might have the code like the following: > > > > do { > > ret = action->handler(irq, action->dev_id, regs); > > if (ret == IRQ_HANDLED) { > > status |= action->flags; > > retval |= ret; > > break; > > ^^^^^^ > > } > > action = action->next; > > } while (action); > > > Hmm, I think now I understand what you mean. If the guest irq is shared by > a normal IRQ and a MSI-INTx translated IRQ, two sources may assert the > pin while they both get pending. When this irq is injected, if the guest > only handles one irq source each time, and issues EOI right after it > clears the normal IRQ, the MSI is lost. Is it what you mean? > > There is logic to avoid this from happening, see > hvm_irq->gsi_assert_count[gsi]. Basically, it''s used to count how many > sources have asserted a shared pin. And at the time of EOI, after the > decrement of the counter, if it''s still not 0, the LAPIC is re-asserted. > This may result in some spurious interrupts to guest, but that''s better > than losing interrupts. > > The sharing of guest irq is generally not a good idea, in fact, this is > not even well supported in current Xen code. You may have seen something > like "girq[ggsi].mirq = mirq". That way, we are already stuck. > Currently, if the number of assigned devices is <= 8, there should be no > sharing, otherwise, very weird things may happen... > > Uncommon devices of OS does have the possibility to fail MSI-INTx, for > example, if the device doesn''t behave the same way using INTx and MSI, > or the guest OS doesn''t always clear guest source before issuing EOI. > That''s why I add the per-device disable function: if a device or OS > doesn''t work properly, just turn it off. Fortunately, this is extremely > rare. > > Btw, AFAIK, Windows handles all irq sources in one ISR, similar to > Linux. > > Thanks, > Qing > > > > This code will work on real machine, because I/O APIC re-transmits > > interrupt, if the cause to be handled remains. If some OS has the > > code like the above, the assumption isn''t right. > > > > Actually, my concern is whether the assumption is right for Windows, > > or not. Do you know about this, or does your patch works well with > > Windows guest? > > > > Thanks, > > -- > > Shohei Fujiwara > > > > > Generally, it''s easy to "translate" an edged interrupt to a level one, > > > but not the other way. > > > > > > Thanks, > > > Qing > > > > > > > > What do you think? > > > > > > > > Thanks, > > > > -- > > > > Shohei Fujiwara > > > > > > > > > > _______________________________________________ > > > Xen-devel mailing list > > > Xen-devel@lists.xensource.com > > > http://lists.xensource.com/xen-devel > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-13 09:28 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Tue, 2009-01-13 at 17:05 +0800, Shohei Fujiwara wrote:> What I understand are the followings. > > - When guest gsi isn''t shared, MSI-INTx interrupt translation works fine. > - When guest gsi is shared between passthrough device and emulated > device, MSI-INTx interrupt translation work, though guest OS > receives spurious interrupts. > - Sharing guest gsi among passthrough devices isn''t supported.Yes.> - There are some unsuitable devices for MSI-INTx interrupt translation.I still don''t know an example of such unsuitable devices. And to be exact, it should be device/guest OS driver combination. The improper function may be caused by the guest driver (e.g. not cleaning irq source before issuing EOI)> > > May I ask you any additional questions? > > First, why can''t we assign more than 8 devices? > In the view of guest OS, assigned device is always single function > device. This means assigned devices use only INTA. And Interrupt > routing in hypervisor is shown as follows.Well, I just checked the code, seems I''m using the stale knowledge. The assigned devices didn''t fix to INTA a few months ago, at that time, it was 8, since (0:3.1) collided with (0:11.0). It''s now 32, and the situation is definitely much better.> > From xen/include/asm-x86/hvm/irq.c: > #define hvm_pci_intx_gsi(dev, intx) \ > (((((dev)<<2) + ((dev)>>3) + (intx)) & 31) + 16) > > I think sharing guest gsi among passthrough devices doesn''t > occur if assigned device is <= 32. > > > Second, it is nice to create blacklist of unsuitable devices for MSI-INTx > interrupt, isn''t it? The reason is the problem seems device-specific problem.This is reasonable, however, I don''t know a real unsuitable at this time. Isn''t it better to add the blacklist when there is the real need? What do you think? Thanks, Qing> > Thanks, > -- > Shohei Fujiwara > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-14 06:39 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Tue, 13 Jan 2009 17:28:13 +0800 Qing He <qing.he@intel.com> wrote:> On Tue, 2009-01-13 at 17:05 +0800, Shohei Fujiwara wrote: > > What I understand are the followings. > > Second, it is nice to create blacklist of unsuitable devices for MSI-INTx > > interrupt, isn''t it? The reason is the problem seems device-specific problem. > > This is reasonable, however, I don''t know a real unsuitable at this > time. Isn''t it better to add the blacklist when there is the real need? > > What do you think?I think adding blacklist now is better, because it is easy to specify an unsuitable device when it is found. But I don''t know what other developers think about this. BTW, I found a issue when I read your code. MSI-INTx translation is enabled when guest domain is started. When guest OS enables MSI/MSI-X, MSI-INTx translation is disabled. But when guest OS disables MSI/MSI-X, MSI-INTx translation isn''t re-enabled. This means normal INTx interrupt is used. I think the results should be the same at all times, so it is good to re-enable MSI-INTx translation. Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-14 07:38 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Wed, 2009-01-14 at 14:39 +0800, Shohei Fujiwara wrote:> I think adding blacklist now is better, because it is easy to specify > an unsuitable device when it is found. But I don''t know what other > developers think about this.I''m OK either way> > > BTW, I found a issue when I read your code. > > MSI-INTx translation is enabled when guest domain is started. > When guest OS enables MSI/MSI-X, MSI-INTx translation is disabled. > But when guest OS disables MSI/MSI-X, MSI-INTx translation isn''t re-enabled. > This means normal INTx interrupt is used. > > I think the results should be the same at all times, > so it is good to re-enable MSI-INTx translation.There is a complication here. Some guests, notably certain versions of Linux (2.6.25 for instance) uses MSI enable bit as mask, like:>> static void msi_set_mask_bit(unsigned int irq, int flag) >> { >> ... >> switch (entry->msi_attrib.type) { >> case PCI_CAP_ID_MSI: >> if (entry->msi_attrib.maskbit) { >> ... >> } else { >> msi_set_enable(entry->dev, !flag); >> } >> break; >> ...(It is said not work for some devices since pending msi may be lost in this way, so it got reverted in later versions) If the guest extensively uses this technique, re-enabling MSI-INTx during mask would be a pain, since it needs several hypercalls and thus hurts performance. Furthermore, why a guest driver would want to disable MSI if it can benefit from it (Maybe because MSI doesn''t work:-)? The presumption behind not re-enabling the translation is that it''s seldom that guest would want to disable-after-enable the MSI. And it''s more efficient for guests like linux-2.6.25.> > Thanks, > -- > Shohei Fujiwara >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-14 08:26 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Wed, 14 Jan 2009 15:38:46 +0800 Qing He <qing.he@intel.com> wrote:> On Wed, 2009-01-14 at 14:39 +0800, Shohei Fujiwara wrote: > > I think adding blacklist now is better, because it is easy to specify > > an unsuitable device when it is found. But I don''t know what other > > developers think about this. > > I''m OK either way > > > > > > > BTW, I found a issue when I read your code. > > > > MSI-INTx translation is enabled when guest domain is started. > > When guest OS enables MSI/MSI-X, MSI-INTx translation is disabled. > > But when guest OS disables MSI/MSI-X, MSI-INTx translation isn''t re-enabled. > > This means normal INTx interrupt is used. > > > > I think the results should be the same at all times, > > so it is good to re-enable MSI-INTx translation. > > There is a complication here. Some guests, notably certain versions of > Linux (2.6.25 for instance) uses MSI enable bit as mask, like: > > >> static void msi_set_mask_bit(unsigned int irq, int flag) > >> { > >> ... > >> switch (entry->msi_attrib.type) { > >> case PCI_CAP_ID_MSI: > >> if (entry->msi_attrib.maskbit) { > >> ... > >> } else { > >> msi_set_enable(entry->dev, !flag); > >> } > >> break; > >> ... > > (It is said not work for some devices since pending msi may be lost in > this way, so it got reverted in later versions) > > If the guest extensively uses this technique, re-enabling MSI-INTx during > mask would be a pain, since it needs several hypercalls and thus hurts > performance. > > Furthermore, why a guest driver would want to disable MSI if it can benefit > from it (Maybe because MSI doesn''t work:-)? The presumption behind not > re-enabling the translation is that it''s seldom that guest would want to > disable-after-enable the MSI. And it''s more efficient for > guests like linux-2.6.25.Basically, I would not like to assume any behavior of guest OS or user. User might unload device driver at runtime, and load it again disabling MSI. So we still need to use I/O APIC. we are not freed from sharing machine GSI problem. Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-14 09:17 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Wed, 2009-01-14 at 16:26 +0800, Shohei Fujiwara wrote:> > If the guest extensively uses this technique, re-enabling MSI-INTx during > > mask would be a pain, since it needs several hypercalls and thus hurts > > performance. > > > > Furthermore, why a guest driver would want to disable MSI if it can benefit > > from it (Maybe because MSI doesn''t work:-)? The presumption behind not > > re-enabling the translation is that it''s seldom that guest would want to > > disable-after-enable the MSI. And it''s more efficient for > > guests like linux-2.6.25. > > Basically, I would not like to assume any behavior of guest OS or > user. User might unload device driver at runtime, and load it again > disabling MSI. So we still need to use I/O APIC. we are not freed > from sharing machine GSI problem.For the user unloading and reloading issue, if the driver uses MSI for both times, it works fine. If the driver uses INTx for both times without ever enabling MSI, it also works fine with MSI-INTx translation on at all time. Only if the user starts with driver in MSI mode, unloads it, and then reloads it in legacy irq mode, he may go back to no translation, but why does he want to do this? And even in this case, it still works correctly, just without MSI-INTx translation. This patch is never intended to be a perfect for all solution, as mentioned early, if the device or guest driver does something nasty, it would fail, that''s why the option is added in the first place. Since INTx is considered leagacy regarding to MSI, the bottom line is that the patch should not hurt guest passthrough MSI performance, which is not the case if MSI-INTx is turned on when MSI is simply masked (implemented as disabling by some guests). Using INTx after enabling and disabling MSI may not be as fast as it can be (it''s definitely no slower). But considering the rarity of this situation, I would think the trade-off is just sound. What do you think? Thanks, Qing> > Thanks, > -- > Shohei Fujiwara >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-15 02:35 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
I think MSI-INTx translation can be enhanced. In case of using MSI enable bit as mask, guest OS need to set interrupt disable bit to prevent device from asserting INTx. Actually, linux 2.6.25 set interrupt disable bit in drivers/pci/msi.c:msi_capability_init. If interrupt disable bit is 1, we can assume guest OS might use MSI enable bit as mask. If interrupt disable bit is 0 and MSI enable bit is 0, we can assume guest OS uses INTx. I think we can archive both of good performance and not-using INTx, if we control MSI-INTx translation as follows. MSI/MSI-X | Interrupt | MSI-INTx | Note enable bit | disable bit | translation | -----------+-------------+-------------+------------------------------ 0 | 0 | enable | Re-enabling MSI-INTx | | | translation is needed. -----------+-------------+-------------+------------------------------ 0 | 1 | disable | Guest OS might use MSI enable | | | bit as mask. -----------+-------------+-------------+------------------------------ 1 | 0 | disable | Guest OS use MSI/MSI-X. -----------+-------------+-------------+------------------------------ 1 | 1 | disable | Guest OS use MSI/MSI-X. It is great if we can guarantee INTx interrupt isn''t used. We don''t need to consider sharing machine gsi. We don''t need to consider hot-pluging I/O APIC. In long term, It will be possible even to to remove I/O APIC from real machine. What do you think? Thanks -- Shohei Fujiwara On Wed, 14 Jan 2009 17:17:52 +0800 Qing He <qing.he@intel.com> wrote:> On Wed, 2009-01-14 at 16:26 +0800, Shohei Fujiwara wrote: > > > If the guest extensively uses this technique, re-enabling MSI-INTx during > > > mask would be a pain, since it needs several hypercalls and thus hurts > > > performance. > > > > > > Furthermore, why a guest driver would want to disable MSI if it can benefit > > > from it (Maybe because MSI doesn''t work:-)? The presumption behind not > > > re-enabling the translation is that it''s seldom that guest would want to > > > disable-after-enable the MSI. And it''s more efficient for > > > guests like linux-2.6.25. > > > > Basically, I would not like to assume any behavior of guest OS or > > user. User might unload device driver at runtime, and load it again > > disabling MSI. So we still need to use I/O APIC. we are not freed > > from sharing machine GSI problem. > > For the user unloading and reloading issue, if the driver uses MSI for > both times, it works fine. If the driver uses INTx for both times > without ever enabling MSI, it also works fine with MSI-INTx translation > on at all time. Only if the user starts with driver in MSI mode, unloads > it, and then reloads it in legacy irq mode, he may go back to no > translation, but why does he want to do this? And even in this case, > it still works correctly, just without MSI-INTx translation. > > This patch is never intended to be a perfect for all solution, as > mentioned early, if the device or guest driver does something nasty, it > would fail, that''s why the option is added in the first place. > > Since INTx is considered leagacy regarding to MSI, the bottom line is > that the patch should not hurt guest passthrough MSI performance, which > is not the case if MSI-INTx is turned on when MSI is simply masked > (implemented as disabling by some guests). > > Using INTx after enabling and disabling MSI may not be as fast as it > can be (it''s definitely no slower). But considering the rarity of this > situation, I would think the trade-off is just sound. > > What do you think? > > > Thanks, > Qing_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Jan-15 06:25 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Thu, 2009-01-15 at 10:35 +0800, Shohei Fujiwara wrote:> I think MSI-INTx translation can be enhanced. > > In case of using MSI enable bit as mask, guest OS need to set interrupt > disable bit to prevent device from asserting INTx. Actually, linux > 2.6.25 set interrupt disable bit in > drivers/pci/msi.c:msi_capability_init. If interrupt disable bit is 1, > we can assume guest OS might use MSI enable bit as mask. If interrupt > disable bit is 0 and MSI enable bit is 0, we can assume guest OS uses > INTx.This should be better, but cannot garauntee it. Since INTx disable bit was added in PCI 2.3. Early guests may exist not using it. Also, some devices have problems, that setting INTx disable bit also completely disables MSI from signalling, so OS may not touch INTx disable bit of these devices. Theoretically, MSI-enable-as-masking is not correct and should not be used in the first place. If that were true it would be much easier for us. However, the existing (and widely used) guest complicates this matter...> > I think we can archive both of good performance and not-using INTx, if > we control MSI-INTx translation as follows. > > MSI/MSI-X | Interrupt | MSI-INTx | Note > enable bit | disable bit | translation | > -----------+-------------+-------------+------------------------------ > 0 | 0 | enable | Re-enabling MSI-INTx > | | | translation is needed. > -----------+-------------+-------------+------------------------------ > 0 | 1 | disable | Guest OS might use MSI enable > | | | bit as mask. > -----------+-------------+-------------+------------------------------ > 1 | 0 | disable | Guest OS use MSI/MSI-X. > -----------+-------------+-------------+------------------------------ > 1 | 1 | disable | Guest OS use MSI/MSI-X. > > It is great if we can guarantee INTx interrupt isn''t used. > We don''t need to consider sharing machine gsi. We don''t need to consider > hot-pluging I/O APIC. In long term, It will be possible even to to remove > I/O APIC from real machine.I am not against it, it does look more symmetric, though in the cost of a little overhead (4 hypercalls for a mask operation) in the worst (but rare) case. I''ll ack the patch if you add it now, or you can wait for me to add it at some later time. Thanks, Qing> > What do you think? > > Thanks > -- > Shohei Fujiwara >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Jan-16 04:34 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
Thank you for long discussion. On Thu, 15 Jan 2009 14:25:14 +0800 Qing He <qing.he@intel.com> wrote:> On Thu, 2009-01-15 at 10:35 +0800, Shohei Fujiwara wrote: > > I think MSI-INTx translation can be enhanced. > > > > In case of using MSI enable bit as mask, guest OS need to set interrupt > > disable bit to prevent device from asserting INTx. Actually, linux > > 2.6.25 set interrupt disable bit in > > drivers/pci/msi.c:msi_capability_init. If interrupt disable bit is 1, > > we can assume guest OS might use MSI enable bit as mask. If interrupt > > disable bit is 0 and MSI enable bit is 0, we can assume guest OS uses > > INTx. > > This should be better, but cannot garauntee it. Since INTx disable bit > was added in PCI 2.3. Early guests may exist not using it. Also, some > devices have problems, that setting INTx disable bit also completely > disables MSI from signalling, so OS may not touch INTx disable bit of > these devices.I think we can guarantee it, even if OS doesn''t touch INTx disable bit. When OS set 1 to MSI/MSI-X enable bit, qemu will disable MSI-INTx translation. When OS set 0 to MSI/MSI-X enable bit, qemu will re-enable MSI-INTx translation.> Theoretically, MSI-enable-as-masking is not correct and should not be > used in the first place. If that were true it would be much easier for > us. However, the existing (and widely used) guest complicates this matter...I agree with you that we need make MSI-enable-as-masking work on HVM domain.> > I think we can archive both of good performance and not-using INTx, if > > we control MSI-INTx translation as follows. > > > > MSI/MSI-X | Interrupt | MSI-INTx | Note > > enable bit | disable bit | translation | > > -----------+-------------+-------------+------------------------------ > > 0 | 0 | enable | Re-enabling MSI-INTx > > | | | translation is needed. > > -----------+-------------+-------------+------------------------------ > > 0 | 1 | disable | Guest OS might use MSI enable > > | | | bit as mask. > > -----------+-------------+-------------+------------------------------ > > 1 | 0 | disable | Guest OS use MSI/MSI-X. > > -----------+-------------+-------------+------------------------------ > > 1 | 1 | disable | Guest OS use MSI/MSI-X. > > > > It is great if we can guarantee INTx interrupt isn''t used. > > We don''t need to consider sharing machine gsi. We don''t need to consider > > hot-pluging I/O APIC. In long term, It will be possible even to to remove > > I/O APIC from real machine. > > I am not against it, it does look more symmetric, though in the cost of > a little overhead (4 hypercalls for a mask operation) in the worst > (but rare) case. I''ll ack the patch if you add it now, or you can wait > for me to add it at some later time.Currently I''m trying to enabling PCI pass-through with stub domain. I''m happy if you can add it. Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Feb-27 02:41 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Thu, 8 Jan 2009 22:52:00 +0800 Qing He <qing.he@intel.com> wrote:> On Thu, 2009-01-08 at 18:44 +0800, Shohei Fujiwara wrote: > > I have one question. > > > > MSI interrupt is edge-triggered, and INTx interrupt is level-triggered. > > Guest OS handles translated interrupt as level-triggered, though physical > > interrupt is edge-triggered. When two interrupts are raised during short > > period, Guest OS might lose 2nd interrupt, I think. > > This problem is handled by a different EOI timing. As soon as the > hypervisor receives an MSI, it issues the EOI ASAP, and the duration of > the injected level-triggered IRQ and guest EOI are all handled by the > virtual APICs. If a 2nd interrupt comes up at this time, the hypervisor > can receive the EOI and this results in a pending IRQ in virtual APICs > instead of lost.I found hypervisor doesn''t issue EOI before injecting the interrupt to guest domain, if MSI isn''t maskable. The 2nd interrupt may be lost. What do you think about this? xen/arch/x86/irq.c: asmlinkage void do_IRQ(struct cpu_user_regs *regs) { unsigned int vector = regs->entry_vector; irq_desc_t *desc = &irq_desc[vector]; struct irqaction *action; perfc_incr(irqs); spin_lock(&desc->lock); desc->handler->ack(vector); xen/arch/x86/io_apic.c: static void ack_msi_vector(unsigned int vector) { if ( msi_maskable_irq(irq_desc[vector].msi_desc) ) ack_APIC_irq(); /* ACKTYPE_NONE */ } Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-01 14:55 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On 27/02/2009 02:41, "Shohei Fujiwara" <fujiwara-sxa@necst.nec.co.jp> wrote:> I found hypervisor doesn''t issue EOI before injecting the > interrupt to guest domain, if MSI isn''t maskable. > The 2nd interrupt may be lost. What do you think about this?The second interrupt should not be lost! It should be marked in the local APIC''s IRR bitmap and should hence cause an interrupt to fire as soon as the in-service interrupt is EOIed (and hence cleared from the local APIC''s ISR bitmap). Are you thus *sure* this is really the root-cause of your problem? Could it be that the MSI *is* maskable, and that masking/unmasking loses interrupt messages, for example? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Mar-02 07:19 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
Thank you for your reply. On Sun, 01 Mar 2009 14:55:21 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 27/02/2009 02:41, "Shohei Fujiwara" <fujiwara-sxa@necst.nec.co.jp> wrote: > > > I found hypervisor doesn''t issue EOI before injecting the > > interrupt to guest domain, if MSI isn''t maskable. > > The 2nd interrupt may be lost. What do you think about this? > > The second interrupt should not be lost! It should be marked in the local > APIC''s IRR bitmap and should hence cause an interrupt to fire as soon as the > in-service interrupt is EOIed (and hence cleared from the local APIC''s ISR > bitmap).I understand that we don''t need to issue EOI before injecting the interrupt to guest domain.> Are you thus *sure* this is really the root-cause of your problem?When I was reading the code on Friday, I found the hypervisor doesn''t issue EOI before injecting the interrupt to guest domain. This seemed to be different from Qing''s explanation, and then I made the question. So I have no real problem. Today, I found the hypervisor issues EOI ASAP. So Qing''s explanation reflected the implementation of the hypervisor. xen/drivers/passthrough/io.c: void hvm_dirq_assist(struct vcpu *v) { ... if ( hvm_irq_dpci->mirq[irq].flags & HVM_IRQ_DPCI_TRANSLATE ) { /* for translated MSI to INTx interrupt, eoi as early as possible */ __msi_pirq_eoi(d, irq); } Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-02 08:47 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On 02/03/2009 07:19, "Shohei Fujiwara" <fujiwara-sxa@necst.nec.co.jp> wrote:> Today, I found the hypervisor issues EOI ASAP. So Qing''s explanation > reflected the implementation of the hypervisor. > > xen/drivers/passthrough/io.c: > void hvm_dirq_assist(struct vcpu *v) > { > ... > if ( hvm_irq_dpci->mirq[irq].flags & HVM_IRQ_DPCI_TRANSLATE ) > { > /* for translated MSI to INTx interrupt, eoi as early as > possible */ > __msi_pirq_eoi(d, irq); > }Ah, well the passthrough-specific logic is not something I''m so familiar with! That above code indeed doesn''t look correct. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Qing He
2009-Mar-02 09:24 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Mon, 2009-03-02 at 16:47 +0800, Keir Fraser wrote:> On 02/03/2009 07:19, "Shohei Fujiwara" <fujiwara-sxa@necst.nec.co.jp> wrote: > > > Today, I found the hypervisor issues EOI ASAP. So Qing''s explanation > > reflected the implementation of the hypervisor. > > > > xen/drivers/passthrough/io.c: > > void hvm_dirq_assist(struct vcpu *v) > > { > > ... > > if ( hvm_irq_dpci->mirq[irq].flags & HVM_IRQ_DPCI_TRANSLATE ) > > { > > /* for translated MSI to INTx interrupt, eoi as early as > > possible */ > > __msi_pirq_eoi(d, irq); > > } > > Ah, well the passthrough-specific logic is not something I''m so familiar > with! That above code indeed doesn''t look correct. :-)I recently learned that some device may keep issuing MSI until it is explicitly cleared by software. In that case it just behaves like level-triggered interrupt, and doing something like the code above will cause interrupt storm. This is reflected in cs. 19065. Do you mean this issue? Thanks, Qing> > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-02 09:40 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On 02/03/2009 09:24, "Qing He" <qing.he@intel.com> wrote:>> Ah, well the passthrough-specific logic is not something I''m so familiar >> with! That above code indeed doesn''t look correct. :-) > > I recently learned that some device may keep issuing MSI until it is > explicitly > cleared by software. In that case it just behaves like level-triggered > interrupt, and doing something like the code above will cause interrupt storm. > This is reflected in cs. 19065. > > Do you mean this issue?That would be the expected result of early EOI, yes. But Shohei was walking about losing interrupts. I don''t know how we would do that. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Shohei Fujiwara
2009-Mar-02 10:27 UTC
Re: [Xen-devel] [PATCH 0/6] MSI-INTx interrupt translation for HVM
On Mon, 02 Mar 2009 09:40:40 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote:> On 02/03/2009 09:24, "Qing He" <qing.he@intel.com> wrote: > > >> Ah, well the passthrough-specific logic is not something I''m so familiar > >> with! That above code indeed doesn''t look correct. :-) > > > > I recently learned that some device may keep issuing MSI until it is > > explicitly > > cleared by software. In that case it just behaves like level-triggered > > interrupt, and doing something like the code above will cause interrupt storm. > > This is reflected in cs. 19065. > > > > Do you mean this issue? > > That would be the expected result of early EOI, yes. But Shohei was walking > about losing interrupts. I don''t know how we would do that.Currently I don''t think early EOI is needed, because IRR in local APIC keeps interrupts. I don''t think losing interrupts will occur. Thanks, -- Shohei Fujiwara _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel