Laszlo Ersek
2011-Jun-01 10:05 UTC
[Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
Hi, this is more of an RFC than a patch now for linux-2.6.18-xen. Describing the situation captured in RHBZ#688673, in a nutshell: - let''s say we have an Intel 82576 card (igb), - the card exports virtual functions (igbvf), - one virtual function is passed through to a PV guest, - the igbvf driver, co-operating with pcifront, pciback, and the hypervisor, sets up three MSI-X vectors for the virtual function PCI device, - when the domU is shut down, the MSI-X vectors are "leaked" in dom0, because nobody ever reaches dom0''s pci_disable_msix() / msi_remove_pci_irq_vectors() during shutdown, - therefore configuration of the VF during the next boot of such a guest doesn''t succeed (dom0 thinks, based on its stale list, that MSI-X vectors are already allocated/mapped for the device) dom0 (msix_capability_init(), output beautified a bit): msix entry 0 for dev 01:10:0 are not freed before acquire again. msix entry 1 for dev 01:10:0 are not freed before acquire again. msix entry 2 for dev 01:10:0 are not freed before acquire again. guest: Determining IP information for eth1... Failed to obtain physical IRQ 255 Failed to obtain physical IRQ 254 Failed to obtain physical IRQ 253 - if the device or the full igbvf module is removed before shutdown in the guest (in case the boot was successful to begin with!), then pci_disable_msix() is called and everything works fine; the next boot / ifup succeeds. I backported c/s 1070 to the RHEL-5 kernel, but it was not enough -- this is not an abrupt termination (destroy), but a contolled shutdown. Here''s what I suspect happens (in RHEL-5) when device_shutdown() walks the list of devices in the PV domU and calls the appropriate shutdown hook for igbvf: domU: igbvf_shutdown() igbvf_suspend() pci_disable_device() pcibios_disable_device() pcibios_disable_resources() pci_write_config_word( dev, PCI_COMMAND, orig & ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY) ) ... pcifront_bus_write() do_pci_op(XEN_PCI_OP_conf_write) dom0: pciback_do_op() pciback_config_write() conf_space_write() command_write() [via PCI_COMMAND funcptr] pci_disable_device() disable_msi_mode() dev->msix_enabled = 0; The final assignment above precludes c/s 1070 from doing the job. Of course igbvf_shutdown() could invoke igbvf_remove() instead (or another lower-level function that ends up in dom0''s pci_disable_msix()), but it should not depend on the guest playing nice. Below''s my proposed patch for RHEL-5, ported to upstream. ----v---- introduce msi_prune_pci_irq_vectors() extend pciback_reset_device() with the following functionality (msi_prune_pci_irq_vectors()): - remove all (MSI-X vector, entry number) pairs that might still belong, in dom0''s mind, to the PCI device being reset, - unmap some space "used for saving/restoring MSI-X tables" that might otherwise go leaked The function, modeled after msi_remove_pci_irq_vectors(), doesn''t touch the hypervisor, nor the PCI device; it only trims the list used by dom0 for filtering. Justification being: when this is called, either the owner domain is gone already (or very close to being gone), or the device was already correctly detached. ----^---- Now I think one comment in the patch below does not hold for upstream: "msi_dev_head is only maintained in dom0". According to c/s 680, this doesn''t seem to be the case for upstream. Is the case described above possible at all in upstream? Do you think the fix I propose is correct? It seemed to do the job with RHEL-5 in my testing. dom0: PCI: 0000:01:10.0: cleaning up MSI-X entry 0 PCI: 0000:01:10.0: cleaning up MSI-X entry 1 PCI: 0000:01:10.0: cleaning up MSI-X entry 2 PCI: 0000:01:10.0: cleaning up mask_base Just in case: drivers/pci/msi-xen.c | 46 ++++++++++++++++++++++++++++++++++++++ drivers/xen/pciback/pciback_ops.c | 5 ++++ include/linux/pci.h | 1 3 files changed, 52 insertions(+) Signed-off-by: Laszlo Ersek<lersek@redhat.com> Thank you very much! lacos diff -r 876a5aaac026 drivers/pci/msi-xen.c --- a/drivers/pci/msi-xen.c Thu May 26 12:33:41 2011 +0100 +++ b/drivers/pci/msi-xen.c Tue May 31 19:17:26 2011 +0200 @@ -894,6 +894,51 @@ void msi_remove_pci_irq_vectors(struct p dev->irq = msi_dev_entry->default_irq; } +void msi_prune_pci_irq_vectors(struct pci_dev *dev) +{ + unsigned long flags; + struct msi_dev_list *msi_dev_scan, *msi_dev_entry = NULL; + struct msi_pirq_entry *pirq_entry, *tmp; + + if (!pci_msi_enable || !dev) + return; + + /* msi_dev_head is only maintained in dom0 */ + BUG_ON(!is_initial_xendomain()); + + /* search for the set of MSI-X vectors, don''t extend list */ + spin_lock_irqsave(&msi_dev_lock, flags); + list_for_each_entry(msi_dev_scan, &msi_dev_head, list) + if (msi_dev_scan->dev == dev) { + msi_dev_entry = msi_dev_scan; + break; + } + spin_unlock_irqrestore(&msi_dev_lock, flags); + if (msi_dev_entry == NULL) + return; + + /* Empty the set. No PHYSDEVOP_unmap_pirq hypercalls: the domU is + * already gone, or the device is already unplugged. + */ + spin_lock_irqsave(&msi_dev_entry->pirq_list_lock, flags); + if (!list_empty(&msi_dev_entry->pirq_list_head)) + list_for_each_entry_safe(pirq_entry, tmp, + &msi_dev_entry->pirq_list_head, list) { + printk(KERN_INFO "PCI: %s: cleaning up MSI-X entry " + "%d\n", pci_name(dev), pirq_entry->entry_nr); + list_del(&pirq_entry->list); + kfree(pirq_entry); + } + spin_unlock_irqrestore(&msi_dev_entry->pirq_list_lock, flags); + + if (msi_dev_entry->mask_base != NULL) { + printk(KERN_INFO "PCI: %s: cleaning up mask_base\n", + pci_name(dev)); + iounmap(msi_dev_entry->mask_base); + msi_dev_entry->mask_base = NULL; + } +} + void pci_no_msi(void) { pci_msi_enable = 0; @@ -906,5 +951,6 @@ EXPORT_SYMBOL(pci_disable_msix); #ifdef CONFIG_XEN EXPORT_SYMBOL(register_msi_get_owner); EXPORT_SYMBOL(unregister_msi_get_owner); +EXPORT_SYMBOL(msi_prune_pci_irq_vectors); #endif diff -r 876a5aaac026 drivers/xen/pciback/pciback_ops.c --- a/drivers/xen/pciback/pciback_ops.c Thu May 26 12:33:41 2011 +0100 +++ b/drivers/xen/pciback/pciback_ops.c Tue May 31 19:17:26 2011 +0200 @@ -29,6 +29,11 @@ void pciback_reset_device(struct pci_dev pci_disable_msix(dev); if (dev->msi_enabled) pci_disable_msi(dev); + + /* After a controlled shutdown or the crash fixup above, prune + * dom0''s MSI-X vector list for the device. + */ + msi_prune_pci_irq_vectors(dev); #endif pci_disable_device(dev); diff -r 876a5aaac026 include/linux/pci.h --- a/include/linux/pci.h Thu May 26 12:33:41 2011 +0100 +++ b/include/linux/pci.h Tue May 31 19:17:26 2011 +0200 @@ -640,6 +640,7 @@ extern void msi_remove_pci_irq_vectors(s #ifdef CONFIG_XEN extern int register_msi_get_owner(int (*func)(struct pci_dev *dev)); extern int unregister_msi_get_owner(int (*func)(struct pci_dev *dev)); +extern void msi_prune_pci_irq_vectors(struct pci_dev *dev); #endif #endif _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Laszlo Ersek
2011-Jun-01 11:02 UTC
[Xen-devel] Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
(Perhaps it''s best to write a separate mail about this.) On 06/01/11 12:05, Laszlo Ersek wrote:> Hi, > > this is more of an RFC than a patch now for linux-2.6.18-xen. Describing the situation captured in RHBZ#688673, in a nutshell: > > - let''s say we have an Intel 82576 card (igb), > - the card exports virtual functions (igbvf), > - one virtual function is passed through to a PV guest, > - the igbvf driver, co-operating with pcifront, pciback, and the hypervisor, sets up three MSI-X vectors for the virtual function PCI device, > - when the domU is shut down, the MSI-X vectors are "leaked" in dom0, because nobody ever reaches dom0''s pci_disable_msix() / msi_remove_pci_irq_vectors() during shutdown, > - therefore configuration of the VF during the next boot of such a guest doesn''t succeed (dom0 thinks, based on its stale list, that MSI-X vectors are already allocated/mapped for the device)I tried to install a Fedora 15 PV guest as well (the host/hv was RHEL-5 kernel-xen, -264 build). Surprisingly, when the igbvf driver called pci_enable_msix() in the PV guest at initialization time, it bugged out: $ addr2line -p -f -i -e /usr/lib/debug/lib/modules/2.6.38.6-26.rc1.fc15.x86_64/vmlinux <<<ffffffff8125658d readl at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/arch/x86/include/asm/io.h:58 (inlined by) msix_program_entries at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:527 (inlined by) msix_capability_init at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:578 (inlined by) pci_enable_msix at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:806 (inlined by) pci_enable_msix at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:773 Thanks, lacos [168711.841672] BUG: unable to handle kernel paging request at ffffc90001a9600c [168711.841984] IP: [<ffffffff8125658d>] pci_enable_msix+0x270/0x354 [168711.842343] PGD 9c838067 PUD 9c839067 PMD 3216067 PTE 0 [168711.842675] Oops: 0000 [#1] SMP [168711.843016] last sysfs file: /sys/devices/virtual/block/dm-0/dev [168711.843486] CPU 0 [168711.843507] Modules linked in: igbvf(+) joydev xen_pcifront xen_netfront ipv6 xen_blkfront [168711.844504] [168711.845030] Pid: 449, comm: modprobe Not tainted 2.6.38.6-26.rc1.fc15.x86_64 #1 [168711.845722] RIP: e030:[<ffffffff8125658d>] [<ffffffff8125658d>] pci_enable_msix+0x270/0x354 [168711.846440] RSP: e02b:ffff880098b4bc58 EFLAGS: 00010286 [168711.847172] RAX: ffffc90001a9600c RBX: ffff880003304000 RCX: 0000000000000006 [168711.847960] RDX: 0000000000000006 RSI: ffff88009c800200 RDI: ffff88009c800200 [168711.848775] RBP: ffff880098b4bcc8 R08: ffff88009c80c000 R09: 0000000000000002 [168711.849634] R10: ffff880098361400 R11: 0000000000000003 R12: ffff880002ceac80 [168711.850531] R13: ffff880096278280 R14: 0000000000000000 R15: ffff8800033048f0 [168711.851452] FS: 00007f2ee5360720(0000) GS:ffff88009fdf5000(0000) knlGS:0000000000000000 [168711.852434] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [168711.853454] CR2: ffffc90001a9600c CR3: 00000000033ce000 CR4: 0000000000002620 [168711.854529] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [168711.855638] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000 [168711.856776] Process modprobe (pid: 449, threadinfo ffff880098b4a000, task ffff880003acae40) [168711.857954] Stack: [168711.859139] ffff880096278280 0000000000000002 0000000000000000 ffff8800033048f0 [168711.860445] 0000007298b4bcc8 ffffc9000000000c ffff880000000008 c002ffff00000000 [168711.861777] ffff880003304000 ffff880098350000 ffff880003304000 ffff880003304090 [168711.863149] Call Trace: [168711.864527] [<ffffffffa0083400>] igbvf_probe+0x308/0x85b [igbvf] [168711.865978] [<ffffffff81006c3f>] ? xen_restore_fl_direct_end+0x0/0x1 [168711.867469] [<ffffffff81080b0f>] ? arch_local_irq_restore+0xb/0xd [168711.868984] [<ffffffff812463ce>] local_pci_probe+0x44/0x75 [168711.870531] [<ffffffff81246e60>] pci_device_probe+0xd0/0xff [168711.872170] [<ffffffff812e6af9>] driver_probe_device+0x124/0x1ff [168711.873804] [<ffffffff812e6c2e>] __driver_attach+0x5a/0x7e [168711.875436] [<ffffffff812e6bd4>] ? __driver_attach+0x0/0x7e [168711.875441] [<ffffffff812e5b9a>] bus_for_each_dev+0x53/0x89 [168711.875446] [<ffffffff812e6706>] driver_attach+0x1e/0x20 [168711.875455] [<ffffffff812e6318>] bus_add_driver+0xe2/0x235 [168711.875462] [<ffffffffa0089000>] ? igbvf_init_module+0x0/0x51 [igbvf] [168711.875468] [<ffffffff812e6e17>] driver_register+0x98/0x105 [168711.875474] [<ffffffffa0089000>] ? igbvf_init_module+0x0/0x51 [igbvf] [168711.875480] [<ffffffff81247723>] __pci_register_driver+0x56/0xc3 [168711.875486] [<ffffffffa0089000>] ? igbvf_init_module+0x0/0x51 [igbvf] [168711.875493] [<ffffffffa008904f>] igbvf_init_module+0x4f/0x51 [igbvf] [168711.875500] [<ffffffff810020a4>] do_one_initcall+0x7f/0x137 [168711.875506] [<ffffffff810852dd>] sys_init_module+0xa6/0x1e4 [168711.875511] [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b [168711.875514] Code: 00 0f b7 42 04 c1 e0 04 83 c0 0c 89 45 b8 41 8b 44 24 0c 89 02 41 8b 7c 24 0c 89 4d a0 e8 d0 80 e5 ff 48 63 45 b8 49 03 44 24 20 <8b> 00 be 01 00 00 00 41 89 44 24 08 4c 89 e7 e8 c1 f2 ff ff 4d [168711.875552] RIP [<ffffffff8125658d>] pci_enable_msix+0x270/0x354 [168711.875558] RSP <ffff880098b4bc58> [168711.875560] CR2: ffffc90001a9600c [168711.875564] ---[ end trace c82c0d02bae884a2 ]--- [168711.875570] BUG: sleeping function called from invalid context at kernel/rwsem.c:21 [168711.875573] in_atomic(): 0, irqs_disabled(): 1, pid: 449, name: modprobe [168711.875577] Pid: 449, comm: modprobe Tainted: G D 2.6.38.6-26.rc1.fc15.x86_64 #1 [168711.875580] Call Trace: [168711.875586] [<ffffffff81047d33>] __might_sleep+0xeb/0xf0 [168711.875593] [<ffffffff81474eee>] down_read+0x21/0x38 [168711.875598] [<ffffffff8108be35>] acct_collect+0x4a/0x182 [168711.875603] [<ffffffff81058627>] do_exit+0x216/0x732 [168711.875608] [<ffffffff8147588c>] ? _raw_spin_unlock_irqrestore+0x17/0x19 [168711.875613] [<ffffffff81006c52>] ? check_events+0x12/0x20 [168711.875618] [<ffffffff81476b8e>] oops_end+0xbc/0xc5 [168711.875623] [<ffffffff8146c0b4>] no_context+0x203/0x212 [168711.875628] [<ffffffff81004415>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e [168711.875633] [<ffffffff8146c257>] __bad_area_nosemaphore+0x194/0x1b7 [168711.875637] [<ffffffff8146b8e4>] ? pmd_val+0x10/0x12 [168711.875642] [<ffffffff8146b945>] ? pte_offset_kernel+0x19/0x3f [168711.875647] [<ffffffff8146c28d>] bad_area_nosemaphore+0x13/0x15 [168711.875651] [<ffffffff81478c5d>] do_page_fault+0x1c5/0x37a [168711.875657] [<ffffffff81476095>] page_fault+0x25/0x30 [168711.875662] [<ffffffff8125658d>] ? pci_enable_msix+0x270/0x354 [168711.875667] [<ffffffff81256584>] ? pci_enable_msix+0x267/0x354 [168711.875675] [<ffffffffa0083400>] igbvf_probe+0x308/0x85b [igbvf] [168711.875680] [<ffffffff81006c3f>] ? xen_restore_fl_direct_end+0x0/0x1 [168711.875684] [<ffffffff81080b0f>] ? arch_local_irq_restore+0xb/0xd [168711.875689] [<ffffffff812463ce>] local_pci_probe+0x44/0x75 [168711.875693] [<ffffffff81246e60>] pci_device_probe+0xd0/0xff [168711.875698] [<ffffffff812e6af9>] driver_probe_device+0x124/0x1ff [168711.875703] [<ffffffff812e6c2e>] __driver_attach+0x5a/0x7e [168711.875708] [<ffffffff812e6bd4>] ? __driver_attach+0x0/0x7e [168711.875713] [<ffffffff812e5b9a>] bus_for_each_dev+0x53/0x89 [168711.875718] [<ffffffff812e6706>] driver_attach+0x1e/0x20 [168711.875722] [<ffffffff812e6318>] bus_add_driver+0xe2/0x235 [168711.875729] [<ffffffffa0089000>] ? igbvf_init_module+0x0/0x51 [igbvf] [168711.875734] [<ffffffff812e6e17>] driver_register+0x98/0x105 [168711.875741] [<ffffffffa0089000>] ? igbvf_init_module+0x0/0x51 [igbvf] [168711.875746] [<ffffffff81247723>] __pci_register_driver+0x56/0xc3 [168711.875752] [<ffffffffa0089000>] ? igbvf_init_module+0x0/0x51 [igbvf] [168711.875759] [<ffffffffa008904f>] igbvf_init_module+0x4f/0x51 [igbvf] [168711.875764] [<ffffffff810020a4>] do_one_initcall+0x7f/0x137 [168711.875769] [<ffffffff810852dd>] sys_init_module+0xa6/0x1e4 [168711.875773] [<ffffffff81009bc2>] system_call_fastpath+0x16/0x1b [168711.879089] udevd-work[367]: ''/sbin/modprobe -bv pci:v00008086d000010CAsv000015D9sd000010C9bc02sc00i00'' unexpected exit with status 0x0009 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Jun-01 12:56 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
>>> On 01.06.11 at 12:05, Laszlo Ersek <lersek@redhat.com> wrote: > Hi, > > this is more of an RFC than a patch now for linux-2.6.18-xen. Describing the > situation captured in RHBZ#688673, in a nutshell: > > - let''s say we have an Intel 82576 card (igb), > - the card exports virtual functions (igbvf), > - one virtual function is passed through to a PV guest, > - the igbvf driver, co-operating with pcifront, pciback, and the hypervisor, > sets up three MSI-X vectors for the virtual function PCI device, > - when the domU is shut down, the MSI-X vectors are "leaked" in dom0, because > nobody ever reaches dom0''s pci_disable_msix() / msi_remove_pci_irq_vectors() > during shutdown, > - therefore configuration of the VF during the next boot of such a guest > doesn''t succeed (dom0 thinks, based on its stale list, that MSI-X vectors are > already allocated/mapped for the device) > > dom0 (msix_capability_init(), output beautified a bit): > msix entry 0 for dev 01:10:0 are not freed before acquire again. > msix entry 1 for dev 01:10:0 are not freed before acquire again. > msix entry 2 for dev 01:10:0 are not freed before acquire again. > > guest: > Determining IP information for eth1... > Failed to obtain physical IRQ 255 > Failed to obtain physical IRQ 254 > Failed to obtain physical IRQ 253 > > - if the device or the full igbvf module is removed before shutdown in the > guest (in case the boot was successful to begin with!), then > pci_disable_msix() is called and everything works fine; the next boot / ifup > succeeds. > > I backported c/s 1070 to the RHEL-5 kernel, but it was not enough -- this is > not an abrupt termination (destroy), but a contolled shutdown. Here''s what I > suspect happens (in RHEL-5) when device_shutdown() walks the list of devices > in the PV domU and calls the appropriate shutdown hook for igbvf: > > domU: igbvf_shutdown() > igbvf_suspend() > pci_disable_device() > pcibios_disable_device() > pcibios_disable_resources() > pci_write_config_word( > dev, PCI_COMMAND, > orig & ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY) > ) > ... > pcifront_bus_write() > do_pci_op(XEN_PCI_OP_conf_write) > dom0: pciback_do_op() > pciback_config_write() > conf_space_write() > command_write() [via PCI_COMMAND funcptr] > pci_disable_device() > disable_msi_mode() > dev->msix_enabled = 0; > > The final assignment above precludes c/s 1070 from doing the job.That seems to be a problem in the PCI subsystem, in that disable_msi_mode() is being used from pci_disable_device() (instead of pci_disable_msi{,x}()), and does not seem to be done in a similarly bad way in newer kernels. So I wonder whether you shouldn''t fix pci_disable_device() instead, or alternatively move the vector de-allocation (and then for MSI and MSI-X) into disable_msi_mode(). While the approach you take covers the guest shutdown/restart case, what if the guest called its pci_disable_device() at runtime, expecting to be able to call pci_enable_{device,msi,msix}() on it again later on? Your newly added function would also be called here, but in this situation you would have to call into the hypervisor (as the domain is still alive), i.e. you could as well call msi_remove_pci_irq_vectors(). Additionally, while covering the MSI-X case, I don''t see how the similar already-mapped case would be cleanly handled for MSI. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Laszlo Ersek
2011-Jun-01 14:12 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
On 06/01/11 14:56, Jan Beulich wrote:>>>> On 01.06.11 at 12:05, Laszlo Ersek<lersek@redhat.com> wrote:>> domU: igbvf_shutdown() >> igbvf_suspend() >> pci_disable_device() >> pcibios_disable_device() >> pcibios_disable_resources() >> pci_write_config_word( >> dev, PCI_COMMAND, >> orig& ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY) >> ) >> ... >> pcifront_bus_write() >> do_pci_op(XEN_PCI_OP_conf_write) >> dom0: pciback_do_op() >> pciback_config_write() >> conf_space_write() >> command_write() [via PCI_COMMAND funcptr] >> pci_disable_device() >> disable_msi_mode() >> dev->msix_enabled = 0; >> >> The final assignment above precludes c/s 1070 from doing the job. > > That seems to be a problem in the PCI subsystem, in that > disable_msi_mode() is being used from pci_disable_device() (instead > of pci_disable_msi{,x}()), and does not seem to be done in a similarly > bad way in newer kernels.But does linux-2.6.18-xen avoid the problem? I think its pci_disable_device() still calls disable_msi_mode(), and the latter also only read/writes config words.> So I wonder whether you shouldn''t fix > pci_disable_device() instead, or alternatively move the vector > de-allocation (and then for MSI and MSI-X) into disable_msi_mode().Yes, I did think of that, however IMO that would introduce exactly the problem that you describe below. If either pci_disable_device() or disable_msi_mode() frees the vector, then re-enabling the device can face yet another stumbling block. I liken this a bit to the UNIX(R) signals -- the allocation/mapping of the MSI-X vectors is the "signal disposition" (= signal action, signal delivery), while the PCI dev''s configuration -- whether it''s allowed to generate such interrupts -- is almost like the "signal mask". You can have the vectors allocated / mapped and the device can still be told not to generate those interrupts. In that sense dev->msix_enabled has a split personality (mixed responsibilities).> While the approach you take covers the guest shutdown/restart > case, what if the guest called its pci_disable_device() at runtime, > expecting to be able to call pci_enable_{device,msi,msix}() on it > again later on? Your newly added function would also be called here,May I ask how? The function is only called from pciback_reset_device(), which in turn is called from drivers/xen/pciback/pci_stub.c, pcistub_device_release pcistub_put_pci_dev EXTERN pcistub_init_device pcistub_device_put pcistub_init_devices_late pcistub_seize pcistub_device_get_pci_dev EXTERN pcistub_remove DRIVEROP permissive_add EXTERN DRIVER_ATTR(permissive) pciback_init MODULE_INIT pcistub_probe DRIVEROP pcistub_get_pci_dev_by_slot EXTERN pcistub_get_pci_dev EXTERN I did a cursory search for ''\<pcistub_'', and it seems to me all these functions are only called from functions like pciback_release_pci_dev (passthrough.c, slot.c and vpci.c) pciback_release_devices (passthrough.c, slot.c and vpci.c) In my understanding we have no problems reported about the insertion/removal of the pciback module, or device (de)assignment to / from it.> but in this situation you would have to call into the hypervisor (as > the domain is still alive), i.e. you could as well call > msi_remove_pci_irq_vectors().I considered taking c/s 1070 and simply removing the ifs around the pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by msi_unmap_pirq(), could find no owner domain for the device and return DOMID_SELF. (This is different in upstream I think, see c/s 680.) Then msi_unmap_pirq() would try to unmap a PIRQ for dom0.> Additionally, while covering the MSI-X case, I don''t see how the > similar already-mapped case would be cleanly handled for MSI.The target is "cleanup after shutdown in such a way that it doesn''t hurt in other cases either", so: - it is assumed the hypervisor takes care of the mappings when the domain goes away, - dom0 has no filtering list for MSI. msi_capability_init() "simply" asks the hypervisor for a vector, while msix_capability_init() "gets in the way". The sole purpose of the patch is to trim the MSI-X "filtering" list (msi_dev_head) after the domain is gone. I''m not bold enough to yank out the filtering altogether, even though I think it only does damage wrt. reboots -- it must have been originally meant to catch the case when the *same* guest tries to ask for a set of MSI-X entries for a device, in such a way that the requested set is not distinct from entries requested previously for the same device. The fact that the domid of the device-owning domain is not saved anywhere in the data structure (... at least in RHEL-5) seems to confirm this -- the current implementation has no way to detect that the owning domain has changed; I think it''s even oblivious to this possiblity. I''m looking for the best (... least wrong) location to place the cleanup at; c/s 1070 suggested pciback_reset_device(). Thanks! lacos _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jun-01 14:31 UTC
Re: [Xen-devel] Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
On Wed, Jun 01, 2011 at 01:02:02PM +0200, Laszlo Ersek wrote:> (Perhaps it''s best to write a separate mail about this.)Yes. You did use ''iommu=soft'' on your bootup line right?> $ addr2line -p -f -i -e /usr/lib/debug/lib/modules/2.6.38.6-26.rc1.fc15.x86_64/vmlinux <<<ffffffff8125658d > readl at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/arch/x86/include/asm/io.h:58 > (inlined by) msix_program_entries at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:527 > (inlined by) msix_capability_init at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:578 > (inlined by) pci_enable_msix at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:806 > (inlined by) pci_enable_msix at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:773Maybe you guys are missing some of the xen-pci-front patches? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jun-01 14:51 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
On Wed, Jun 01, 2011 at 12:05:44PM +0200, Laszlo Ersek wrote:> Hi, > > this is more of an RFC than a patch now for linux-2.6.18-xen. Describing the situation captured in RHBZ#688673, in a nutshell: > > - let''s say we have an Intel 82576 card (igb), > - the card exports virtual functions (igbvf), > - one virtual function is passed through to a PV guest, > - the igbvf driver, co-operating with pcifront, pciback, and the hypervisor, sets up three MSI-X vectors for the virtual function PCI device, > - when the domU is shut down, the MSI-X vectors are "leaked" in dom0, because nobody ever reaches dom0''s pci_disable_msix() / msi_remove_pci_irq_vectors() during shutdown,Could you keep a state in the dom0 of the fact that msi/msi-x were enabled, and use those (instead of dev->msi_enabled) to call pci_disable_msi(). And you could also modify the msi_enabled to be set to 1 before you make the pci_disable_msi() call? And naturally if the frontend makes a PV call to disable your MSI device, then you set internally your msi_enabled to zero. Something like this patch (based off stable/2.6.39.x) diff --git a/drivers/pci/xen-pciback/pciback.h b/drivers/pci/xen-pciback/pciback.h index 788c3ee..69f0961 100644 --- a/drivers/pci/xen-pciback/pciback.h +++ b/drivers/pci/xen-pciback/pciback.h @@ -50,11 +50,39 @@ struct xen_pcibk_dev_data { unsigned int enable_intx:1; unsigned int isr_on:1; /* Whether the IRQ handler is installed. */ unsigned int ack_intr:1; /* .. and ACK-ing */ + unsigned int msi_enabled:1; + unsigned int msix_enabled:1; unsigned long handled; unsigned int irq; /* Saved in case device transitions to MSI/MSI-X */ char irq_name[0]; /* xen-pcibk[000:04:00.0] */ }; +#ifdef CONFIG_MSI +static inline bool xen_pcibk_force_msi_disable(struct pci_dev *dev) +{ + struct xen_pcibk_dev_data *dev_data; + + dev_data = pci_get_drvdata(dev); + if (!dev_data) + return false; + + if (dev_data->msi_enabled && !dev->msi_enabled) + return true; + return false; +} +static inline bool xen_pcibk_force_msix_disable(struct pci_dev *dev) +{ + struct xen_pcibk_dev_data *dev_data; + + dev_data = pci_get_drvdata(dev); + if (!dev_data) + return false; + + if (dev_data->msix_enabled && !dev->msix_enabled) + return true; + return false; +} +#endif /* Used by XenBus and xen_pcibk_ops.c */ extern wait_queue_head_t xen_pcibk_aer_wait_queue; extern struct workqueue_struct *xen_pcibk_wq; diff --git a/drivers/pci/xen-pciback/pciback_ops.c b/drivers/pci/xen-pciback/pciback_ops.c index 8c95c34..b37935b 100644 --- a/drivers/pci/xen-pciback/pciback_ops.c +++ b/drivers/pci/xen-pciback/pciback_ops.c @@ -109,6 +109,11 @@ void xen_pcibk_reset_device(struct pci_dev *dev) #ifdef CONFIG_PCI_MSI /* The guest could have been abruptly killed without * disabling MSI/MSI-X interrupts.*/ + + if (xen_pcibk_force_msi_disable(dev)) + dev->msi_enabled = 1; + if (xen_pcibk_force_msix_disable(dev)) + dev->msix_enabled = 1; if (dev->msix_enabled) pci_disable_msix(dev); if (dev->msi_enabled) @@ -160,9 +165,10 @@ int xen_pcibk_enable_msi(struct xen_pcibk_device *pdev, op->value); dev_data = pci_get_drvdata(dev); - if (dev_data) + if (dev_data) { dev_data->ack_intr = 0; - + dev_data->msix_enabled = 1; + } return 0; } @@ -182,8 +188,10 @@ int xen_pcibk_disable_msi(struct xen_pcibk_device *pdev, printk(KERN_DEBUG DRV_NAME ": %s: MSI: %d\n", pci_name(dev), op->value); dev_data = pci_get_drvdata(dev); - if (dev_data) + if (dev_data) { dev_data->ack_intr = 1; + dev_data->msi_enabled = 0; + } return 0; } @@ -232,9 +240,10 @@ int xen_pcibk_enable_msix(struct xen_pcibk_device *pdev, op->value = result; dev_data = pci_get_drvdata(dev); - if (dev_data) + if (dev_data) { dev_data->ack_intr = 0; - + dev_data->msix_enabled = 1; + } return result; } @@ -257,8 +266,10 @@ int xen_pcibk_disable_msix(struct xen_pcibk_device *pdev, printk(KERN_DEBUG DRV_NAME ": %s: MSI-X: %d\n", pci_name(dev), op->value); dev_data = pci_get_drvdata(dev); - if (dev_data) + if (dev_data) { dev_data->ack_intr = 1; + dev_data->msix_disabled = 0; + } return 0; } #endif _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Jun-01 14:59 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
>>> On 01.06.11 at 16:12, Laszlo Ersek <lersek@redhat.com> wrote: > On 06/01/11 14:56, Jan Beulich wrote: >>>>> On 01.06.11 at 12:05, Laszlo Ersek<lersek@redhat.com> wrote: > >>> domU: igbvf_shutdown() >>> igbvf_suspend() >>> pci_disable_device() >>> pcibios_disable_device() >>> pcibios_disable_resources() >>> pci_write_config_word( >>> dev, PCI_COMMAND, >>> orig& ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY) >>> ) >>> ... >>> pcifront_bus_write() >>> do_pci_op(XEN_PCI_OP_conf_write) >>> dom0: pciback_do_op() >>> pciback_config_write() >>> conf_space_write() >>> command_write() [via PCI_COMMAND funcptr] >>> pci_disable_device() >>> disable_msi_mode() >>> dev->msix_enabled = 0; >>> >>> The final assignment above precludes c/s 1070 from doing the job. >> >> That seems to be a problem in the PCI subsystem, in that >> disable_msi_mode() is being used from pci_disable_device() (instead >> of pci_disable_msi{,x}()), and does not seem to be done in a similarly >> bad way in newer kernels. > > But does linux-2.6.18-xen avoid the problem? I think its > pci_disable_device() still calls disable_msi_mode(), and the latter also > only read/writes config words.Yes, and that''s certainly broken (in the native base version, not particularly in the Xen pieces).>> So I wonder whether you shouldn''t fix >> pci_disable_device() instead, or alternatively move the vector >> de-allocation (and then for MSI and MSI-X) into disable_msi_mode(). > > Yes, I did think of that, however IMO that would introduce exactly the > problem that you describe below. If either pci_disable_device() or > disable_msi_mode() frees the vector, then re-enabling the device can > face yet another stumbling block.Why - pci_enable_msi{,x}() call msi{,x}_capability_init(), which obtains the vectors. The problem is that calling disable_msi_mode() alone is always a mistake, this should have been a static function just like its counterpart, enable_msi_mode().>> While the approach you take covers the guest shutdown/restart >> case, what if the guest called its pci_disable_device() at runtime, >> expecting to be able to call pci_enable_{device,msi,msix}() on it >> again later on? Your newly added function would also be called here, > > May I ask how? The function is only called from pciback_reset_device(),I was just looking at the call tree you provided (and which is still quoted at the top). Oh - you''re referring to Dom0 calling the function, whereas I wrote about what happens when guest calls its version of it.>> but in this situation you would have to call into the hypervisor (as >> the domain is still alive), i.e. you could as well call >> msi_remove_pci_irq_vectors(). > > I considered taking c/s 1070 and simply removing the ifs around the > pci_disable_msi[x]() calls. However, msi_get_dev_owner(), called by > msi_unmap_pirq(), could find no owner domain for the device and return > DOMID_SELF. (This is different in upstream I think, see c/s 680.) Then > msi_unmap_pirq() would try to unmap a PIRQ for dom0.Yeah, when the domain is gone you have this problem. But if you properly removed the vectors while the domain is still there, you shouldn''t.>> Additionally, while covering the MSI-X case, I don''t see how the >> similar already-mapped case would be cleanly handled for MSI. > > The target is "cleanup after shutdown in such a way that it doesn''t hurt > in other cases either", so: > > - it is assumed the hypervisor takes care of the mappings when the > domain goes away, > > - dom0 has no filtering list for MSI. msi_capability_init() "simply" > asks the hypervisor for a vector, while msix_capability_init() "gets in > the way".That''s all true, but by considering only your one case you may end up making already crappy code worse.> I''m looking for the best (... least wrong) location to place the cleanup > at; c/s 1070 suggested pciback_reset_device().For that situation this was the natural place, but you''re dealing with an alive domain not being cleaned up after properly during certain operations, and deferring the cleanup until the domain is gone is only going to call for further problems (as outlined earlier, and who knows what other cases we didn''t consider). Hence while I realize that from a pragmatic point of view your approach may seem acceptable, it only fixing a particular aspect of a more general problem is not really something I like. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jun-01 15:01 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
On Wed, Jun 01, 2011 at 10:51:03AM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Jun 01, 2011 at 12:05:44PM +0200, Laszlo Ersek wrote: > > Hi, > > > > this is more of an RFC than a patch now for linux-2.6.18-xen. Describing the situation captured in RHBZ#688673, in a nutshell: > > > > - let''s say we have an Intel 82576 card (igb), > > - the card exports virtual functions (igbvf), > > - one virtual function is passed through to a PV guest, > > - the igbvf driver, co-operating with pcifront, pciback, and the hypervisor, sets up three MSI-X vectors for the virtual function PCI device, > > - when the domU is shut down, the MSI-X vectors are "leaked" in dom0, because nobody ever reaches dom0''s pci_disable_msix() / msi_remove_pci_irq_vectors() during shutdown, > > Could you keep a state in the dom0 of the fact that msi/msi-x were enabled, and use those > (instead of dev->msi_enabled) to call pci_disable_msi(). And you could also modify the > msi_enabled to be set to 1 before you make the pci_disable_msi() call? > > And naturally if the frontend makes a PV call to disable your MSI device, then you > set internally your msi_enabled to zero. > > Something like this patch (based off stable/2.6.39.x)Didn''t even compile :-( How about this one instead (hadn''t tested it yet): diff --git a/drivers/pci/xen-pciback/pciback.h b/drivers/pci/xen-pciback/pciback.h index 788c3ee..c665a4c 100644 --- a/drivers/pci/xen-pciback/pciback.h +++ b/drivers/pci/xen-pciback/pciback.h @@ -50,11 +50,39 @@ struct xen_pcibk_dev_data { unsigned int enable_intx:1; unsigned int isr_on:1; /* Whether the IRQ handler is installed. */ unsigned int ack_intr:1; /* .. and ACK-ing */ + unsigned int msi_enabled:1; + unsigned int msix_enabled:1; unsigned long handled; unsigned int irq; /* Saved in case device transitions to MSI/MSI-X */ char irq_name[0]; /* xen-pcibk[000:04:00.0] */ }; +#ifdef CONFIG_PCI_MSI +static inline bool xen_pcibk_force_msi_disable(struct pci_dev *dev) +{ + struct xen_pcibk_dev_data *dev_data; + + dev_data = pci_get_drvdata(dev); + if (!dev_data) + return false; + + if (dev_data->msi_enabled && !dev->msi_enabled) + return true; + return false; +} +static inline bool xen_pcibk_force_msix_disable(struct pci_dev *dev) +{ + struct xen_pcibk_dev_data *dev_data; + + dev_data = pci_get_drvdata(dev); + if (!dev_data) + return false; + + if (dev_data->msix_enabled && !dev->msix_enabled) + return true; + return false; +} +#endif /* Used by XenBus and xen_pcibk_ops.c */ extern wait_queue_head_t xen_pcibk_aer_wait_queue; extern struct workqueue_struct *xen_pcibk_wq; diff --git a/drivers/pci/xen-pciback/pciback_ops.c b/drivers/pci/xen-pciback/pciback_ops.c index 8c95c34..5c789df 100644 --- a/drivers/pci/xen-pciback/pciback_ops.c +++ b/drivers/pci/xen-pciback/pciback_ops.c @@ -109,6 +109,11 @@ void xen_pcibk_reset_device(struct pci_dev *dev) #ifdef CONFIG_PCI_MSI /* The guest could have been abruptly killed without * disabling MSI/MSI-X interrupts.*/ + + if (xen_pcibk_force_msi_disable(dev)) + dev->msi_enabled = 1; + if (xen_pcibk_force_msix_disable(dev)) + dev->msi_enabled = 1; if (dev->msix_enabled) pci_disable_msix(dev); if (dev->msi_enabled) @@ -160,9 +165,10 @@ int xen_pcibk_enable_msi(struct xen_pcibk_device *pdev, op->value); dev_data = pci_get_drvdata(dev); - if (dev_data) + if (dev_data) { dev_data->ack_intr = 0; - + dev_data->msix_enabled = 1; + } return 0; } @@ -182,8 +188,10 @@ int xen_pcibk_disable_msi(struct xen_pcibk_device *pdev, printk(KERN_DEBUG DRV_NAME ": %s: MSI: %d\n", pci_name(dev), op->value); dev_data = pci_get_drvdata(dev); - if (dev_data) + if (dev_data) { dev_data->ack_intr = 1; + dev_data->msi_enabled = 0; + } return 0; } @@ -232,9 +240,10 @@ int xen_pcibk_enable_msix(struct xen_pcibk_device *pdev, op->value = result; dev_data = pci_get_drvdata(dev); - if (dev_data) + if (dev_data) { dev_data->ack_intr = 0; - + dev_data->msix_enabled = 1; + } return result; } @@ -257,8 +266,10 @@ int xen_pcibk_disable_msix(struct xen_pcibk_device *pdev, printk(KERN_DEBUG DRV_NAME ": %s: MSI-X: %d\n", pci_name(dev), op->value); dev_data = pci_get_drvdata(dev); - if (dev_data) + if (dev_data) { dev_data->ack_intr = 1; + dev_data->msix_enabled = 0; + } return 0; } #endif _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Laszlo Ersek
2011-Jun-01 15:27 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
On 06/01/11 16:59, Jan Beulich wrote:>>>> On 01.06.11 at 16:12, Laszlo Ersek<lersek@redhat.com> wrote: >> On 06/01/11 14:56, Jan Beulich wrote: >>>>>> On 01.06.11 at 12:05, Laszlo Ersek<lersek@redhat.com> wrote: >> >>>> domU: igbvf_shutdown() >>>> igbvf_suspend() >>>> pci_disable_device() >>>> pcibios_disable_device() >>>> pcibios_disable_resources() >>>> pci_write_config_word( >>>> dev, PCI_COMMAND, >>>> orig& ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY) >>>> ) >>>> ... >>>> pcifront_bus_write() >>>> do_pci_op(XEN_PCI_OP_conf_write) >>>> dom0: pciback_do_op() >>>> pciback_config_write() >>>> conf_space_write() >>>> command_write() [via PCI_COMMAND funcptr] >>>> pci_disable_device() >>>> disable_msi_mode() >>>> dev->msix_enabled = 0; >>>>>>> So I wonder whether you shouldn''t fix >>> pci_disable_device() instead, or alternatively move the vector >>> de-allocation (and then for MSI and MSI-X) into disable_msi_mode(). >> >> Yes, I did think of that, however IMO that would introduce exactly the >> problem that you describe below. If either pci_disable_device() or >> disable_msi_mode() frees the vector, then re-enabling the device can >> face yet another stumbling block. > > Why - pci_enable_msi{,x}() call msi{,x}_capability_init(), which > obtains the vectors.Which can fail. I imagined there was a conscious deliberation in the background: hold on to the vectors we already have, just enable/disable generation of the MSIs in the device.> The problem is that calling disable_msi_mode() alone is always a > mistake, this should have been a static function just like its > counterpart, enable_msi_mode().Well this would unify the "disposition" and the "generation", and dev->msix_enabled would have a clear meaning.>>> While the approach you take covers the guest shutdown/restart >>> case, what if the guest called its pci_disable_device() at runtime, >>> expecting to be able to call pci_enable_{device,msi,msix}() on it >>> again later on? Your newly added function would also be called here, >> >> May I ask how? The function is only called from pciback_reset_device(), > > I was just looking at the call tree you provided (and which is still > quoted at the top). Oh - you''re referring to Dom0 calling the > function, whereas I wrote about what happens when guest calls > its version of it.> Yeah, when the domain is gone you have this problem. But if you > properly removed the vectors while the domain is still there, you > shouldn''t.> For that situation this was the natural place, but you''re dealing > with an alive domain not being cleaned up after properly during > certain operations, and deferring the cleanup until the domain is > gone is only going to call for further problems (as outlined earlier, > and who knows what other cases we didn''t consider).The original idea (probably the most straightforward one) was to modify the igbvf driver (ie. the domU) so that it would call igbvf_remove() from igbvf_shutdown(), instead of the current igbvf_suspend(); restricted to the case when the code runs in a PV domU. igbvf_remove() calls igbvf_reset_interrupt_capability(), which in turn calls pci_disable_msix(). The latter percolates through the entire stack and does the right thing. Or, as you say above, the guest''s generic MSI-X massaging PCI code could be modified. We didn''t want to leave this to the guest, however. It''s not that I intentionally try to postpone the cleanup until after the domain has vanished; it''s rather the domU can''t be expected / trusted to do the cleanup itself in all cases. (I realize we can''t talk about "security" in this case at all, since passthrough-to-PV doesn''t cross the IOMMU and "has the potential to access the DMA address space of dom0, which is a potential security concern" anyway. Or some such.)> Hence while I realize that from a pragmatic point of view your > approach may seem acceptable, it only fixing a particular aspect > of a more general problem is not really something I like.Thank you for the review :) lacos _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Jun-01 16:07 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
>>> On 01.06.11 at 17:27, Laszlo Ersek <lersek@redhat.com> wrote: > Or, as you say above, the guest''s generic MSI-X massaging PCI code could > be modified.The host''s.> We didn''t want to leave this to the guest, however. It''s not that I > intentionally try to postpone the cleanup until after the domain has > vanished; it''s rather the domU can''t be expected / trusted to do the > cleanup itself in all cases.When the guest is dead, the host has to take care. As long as the guest is alive, it ought to be responsible (and the host simply must not get in the way). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Paolo Bonzini
2011-Jun-01 16:16 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
On 06/01/2011 04:59 PM, Jan Beulich wrote:> The problem is that calling disable_msi_mode() alone is always a > mistake, this should have been a static function just like its > counterpart, enable_msi_mode().MSI has been almost rewritten in the 2.6.21 timeframe, including along the lines you wrote above: commit b1cbf4e4dddd708ba268c3a2bf38383a269d490a Author: Eric W. Biederman <ebiederm@xmission.com> Date: Mon Mar 5 00:30:10 2007 -0800 [PATCH] msi: fix up the msi enable/disable logic enable/disable_msi_mode have several side effects which keeps them from being generally useful. So this patch replaces them with with two much more targeted functions: msi_set_enable and msix_set_enable. This patch makes pci_dev->msi_enabled and pci_dev->msix_enabled the definitive way to test if linux has enabled the msi capability, and has the appropriate msi data structures set up. commit f5f2b13129a6541debf8851bae843cbbf48298b7 Author: Eric W. Biederman <ebiederm@xmission.com> Date: Mon Mar 5 00:30:07 2007 -0800 [PATCH] msi: sanely support hardware level msi disabling In some cases when we are not using msi we need a way to ensure that the hardware does not have an msi capability enabled. Currently the code has been calling disable_msi_mode to try and achieve that. However disable_msi_mode has several other side effects and is only available when msi support is compiled in so it isn''t really appropriate. commit 866a8c87c4e51046602387953bbef76992107bcb Author: Eric W. Biederman <ebiederm@xmission.com> Date: Sun Jan 28 12:45:54 2007 -0700 msi: Fix msi_remove_pci_irq_vectors. Since msi_remove_pci_irq_vectors is designed to be called during hotplug remove it is actively wrong to query the hardware and expect meaningful results back. To that end remove the pci_find_capability calls. Testing dev->msi_enabled and dev->msix_enabled gives us all of the information we need. Probably the situation needs to be re-evaluated with all these applied to dom0. There''s some hope these apply to 2.6.18. A simple "git log drivers/pci/msi.c" will show a few other patches that the above might depend on (especially the locking changed). Paolo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Paolo Bonzini
2011-Jun-01 16:18 UTC
Re: [Xen-devel] Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
On 06/01/2011 04:31 PM, Konrad Rzeszutek Wilk wrote:>> > $ addr2line -p -f -i -e /usr/lib/debug/lib/modules/2.6.38.6-26.rc1.fc15.x86_64/vmlinux<<<ffffffff8125658d >> > readl at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/arch/x86/include/asm/io.h:58 >> > (inlined by) msix_program_entries at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:527 >> > (inlined by) msix_capability_init at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:578 >> > (inlined by) pci_enable_msix at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:806 >> > (inlined by) pci_enable_msix at /usr/src/debug/kernel-2.6.38.fc15/linux-2.6.38.x86_64/drivers/pci/msi.c:773 > Maybe you guys are missing some of the xen-pci-front patches?This is Fedora, so it''s basically upstream 2.6.38.y. Paolo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Andrew Jones
2011-Jun-01 16:25 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
----- Original Message -----> >>> On 01.06.11 at 17:27, Laszlo Ersek <lersek@redhat.com> wrote: > > Or, as you say above, the guest''s generic MSI-X massaging PCI code > > could > > be modified. > > The host''s. > > > We didn''t want to leave this to the guest, however. It''s not that I > > intentionally try to postpone the cleanup until after the domain has > > vanished; it''s rather the domU can''t be expected / trusted to do the > > cleanup itself in all cases. > > When the guest is dead, the host has to take care. As long as the > guest is alive, it ought to be responsible (and the host simply must > not get in the way). >As Laszlo stated, we can leave the security concerns out since we''re talking about pci passthrough for pv guests. So malicious guests aside, there''s still the scenario where a proprietary, broken (but works on bare-metal) driver comes along and messes things up. You then try to either reboot and re-passthrough to that same guest or even to a different guest that doesn''t have a broken driver, but it now fails. Or, IOW, the host should guarantee that no domu (even ones running proprietary, broken code) can''t ruin the day of another domu (or even its own day on a subsequent boot). Drew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Paolo Bonzini
2011-Jun-01 16:27 UTC
Re: [Xen-devel] [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
On 06/01/2011 06:25 PM, Andrew Jones wrote:> > When the guest is dead, the host has to take care. As long as the > > guest is alive, it ought to be responsible (and the host simply must > > not get in the way). > > As Laszlo stated, we can leave the security concerns out since we''re > talking about pci passthrough for pv guests. So malicious guests aside, > there''s still the scenario where a proprietary, broken (but works on > bare-metal) driver comes along and messes things up. You then try to > either reboot and re-passthrough to that same guest or even to a > different guest that doesn''t have a broken driver, but it now fails. Or, > IOW, the host should guarantee that no domu (even ones running > proprietary, broken code) can''t ruin the day of another domu (or even > its own day on a subsequent boot).I think you''re in violent agreement here. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Laszlo Ersek
2011-Jun-01 17:32 UTC
Re: [Xen-devel] Re: [RFC PATCH linux-2.6.18-xen] pciback: clean up (MSI-X vec, entrynr) list when resetting PCI device
On 06/01/11 16:31, Konrad Rzeszutek Wilk wrote:> On Wed, Jun 01, 2011 at 01:02:02PM +0200, Laszlo Ersek wrote: >> (Perhaps it''s best to write a separate mail about this.) > > Yes. You did use ''iommu=soft'' on your bootup line right?I thought I did; turns out I was wrong. Sorry for the noise. Now I tried with "iommu=soft swiotlb=force", but this way the guest immediately disappears; it doesn''t even start to produce console output. # xm create -c f15-64bit-pv Using config file "/etc/xen/f15-64bit-pv". Using <class ''grub.GrubConf.GrubConfigFile''> to parse /grub/menu.lst Started domain f15-64bit-pv # http://wiki.xen.org/xenwiki/XenPCIpassthrough says though: Bugs: # Starting the DomU using pvgrub with ''iommu=soft swiotlb=force'' breaks pvgrub. Perhaps that could be the reason. Removing "swiotlb=force" and keeping only "iommu=soft" crashes (?) the same way. Thanks, lacos _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Jun-01 18:13 UTC
[Xen-devel] 2.6.38 (FC15) with PCI passthrough fails mysteriously with iommu=soft..
On Wed, Jun 01, 2011 at 07:32:54PM +0200, Laszlo Ersek wrote:> On 06/01/11 16:31, Konrad Rzeszutek Wilk wrote: > >On Wed, Jun 01, 2011 at 01:02:02PM +0200, Laszlo Ersek wrote: > >>(Perhaps it''s best to write a separate mail about this.) > > > >Yes. You did use ''iommu=soft'' on your bootup line right? > > I thought I did; turns out I was wrong. Sorry for the noise. > > Now I tried with "iommu=soft swiotlb=force", but this way the guestYou don''t need the swiotlb=force. I wonder what page talks about that? The Wiki mentions it is only required for older kernels - not the new ones.> immediately disappears; it doesn''t even start to produce consoleIt probably was panicing b/c it couldn''t swizzle out 64MB of DMA32 memory. You can find that out if you do ''earlyprintk=xenboot'' and that should print out the bootlog in your Xen debug console (if you have configured guest_loglvl=all). You can also do ''swiotlb=1024'' to lower the amount. .. which should have worked, except that I found it does not work - so try this patch: https://lkml.org/lkml/2011/6/1/554 Also, you can set this parameter in you guest file to analyze its stack: on_crash="preserve" using xenctx.> output. > > # xm create -c f15-64bit-pv > Using config file "/etc/xen/f15-64bit-pv". > Using <class ''grub.GrubConf.GrubConfigFile''> to parse /grub/menu.lst > Started domain f15-64bit-pv > # > > http://wiki.xen.org/xenwiki/XenPCIpassthrough says though: > > Bugs: > > # Starting the DomU using pvgrub with ''iommu=soft swiotlb=force'' > breaks pvgrub. > > Perhaps that could be the reason. Removing "swiotlb=force" and > keeping only "iommu=soft" crashes (?) the same way.That was .. a bug that Daniel Kiper fixed at some point. It was all in the MiniOS. Are you using pygrub or pvgrub? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Laszlo Ersek
2011-Jun-02 07:42 UTC
[Xen-devel] Re: 2.6.38 (FC15) with PCI passthrough fails mysteriously with iommu=soft..
On 06/01/11 20:13, Konrad Rzeszutek Wilk wrote:> It probably was panicing b/c it couldn''t swizzle out 64MB > of DMA32 memory. You can find that out if you do ''earlyprintk=xenboot'' and > that should print out the bootlog in your Xen debug console (if you have > configured guest_loglvl=all).That seems to be it. [52124.668662] Kernel panic - not syncing: DMA(-12): Failed to exchange pages allocated for DMA with Xen! We either don''t have the permission or you do not have enoughfree memory under 4GB! [52124.668664] [52124.668677] Pid: 0, comm: swapper Not tainted 2.6.38.6-26.rc1.fc15.x86_64 #1 [52124.668681] Call Trace: [52124.668691] [<ffffffff8146c72c>] panic+0x91/0x19c [52124.668698] [<ffffffff81006c3f>] ? xen_restore_fl_direct_end+0x0/0x1 [52124.668704] [<ffffffff81b8f8e6>] xen_swiotlb_init+0xf9/0x131 [52124.668711] [<ffffffff81b6a040>] ? pci_swiotlb_late_init+0x0/0x29 [52124.668717] [<ffffffff8147dfae>] ? _etext+0x0/0x2 [52124.668724] [<ffffffff81b5d6d1>] pci_xen_swiotlb_init+0x17/0x29 [52124.668730] [<ffffffff81b5fe00>] pci_iommu_alloc+0x57/0x6e [52124.668736] [<ffffffff81b6db7e>] mem_init+0x19/0xec [52124.668741] [<ffffffff81b58a3d>] start_kernel+0x200/0x3fe [52124.668746] [<ffffffff81b582c4>] x86_64_start_reservations+0xaf/0xb3 [52124.668752] [<ffffffff81b5bd1b>] xen_start_kernel+0x59c/0x5a3 Thanks! lacos _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Pasi Kärkkäinen
2011-Jun-02 20:49 UTC
Re: [Xen-devel] 2.6.38 (FC15) with PCI passthrough fails mysteriously with iommu=soft..
On Wed, Jun 01, 2011 at 02:13:17PM -0400, Konrad Rzeszutek Wilk wrote:> On Wed, Jun 01, 2011 at 07:32:54PM +0200, Laszlo Ersek wrote: > > On 06/01/11 16:31, Konrad Rzeszutek Wilk wrote: > > >On Wed, Jun 01, 2011 at 01:02:02PM +0200, Laszlo Ersek wrote: > > >>(Perhaps it''s best to write a separate mail about this.) > > > > > >Yes. You did use ''iommu=soft'' on your bootup line right? > > > > I thought I did; turns out I was wrong. Sorry for the noise. > > > > Now I tried with "iommu=soft swiotlb=force", but this way the guest > > You don''t need the swiotlb=force. I wonder what page talks about that? > The Wiki mentions it is only required for older kernels - not the new ones. > > > > immediately disappears; it doesn''t even start to produce console > > It probably was panicing b/c it couldn''t swizzle out 64MB > of DMA32 memory. You can find that out if you do ''earlyprintk=xenboot'' and > that should print out the bootlog in your Xen debug console (if you have > configured guest_loglvl=all). > > You can also do ''swiotlb=1024'' to lower the amount. .. which should have worked, > except that I found it does not work - so try this patch: > > https://lkml.org/lkml/2011/6/1/554 > > Also, you can set this parameter in you guest file to analyze its stack: > > on_crash="preserve" > > using xenctx. >xenctx usage is described here: http://wiki.xen.org/xenwiki/XenCommonProblems#head-61843b32f0243b5ad0e17850f9493bffd80f8c17 -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel