Hello, I''ve already sent this mail to the xen-users mailing list but haven''t gotten a useful answer. So excuse my double post. I''m seeing problems with Xen and IRQ-sharing PCI cards which are passed through to a Xen DomU. I''m not sure whether this is a problem with Xen or with the Debian Xen Kernel (2.6.26-13 and newer from lenny) I''m using. Judging from the single reply I got on xen-users I suspect the former. Reproducing the problem goes as follows: 1. Put a PCI card in the machine and make sure it shares it''s IRQ with another one. In my case it is a RIO Specialix card but I doubt this makes any difference. 2. Map the card to a domU and use it there. In the RIO case this requires loading a kernel module and running a "rioboot" tool. 3. Power down (either shutdown or destroy) the domU without stopping the card (here: "riostop" and "rmmod rio"). The resulting kernel stack trace is as follows: [ 337.409057] pciback 0000:04:06.0: enabling device (0000 -> 0003) [ 337.480998] ACPI: PCI Interrupt 0000:04:06.0[A] -> Link [LNEA] -> GSI 19 (level, low) -> IRQ 19 [ 361.759041] eth0: port 2(vif2.0) entering disabled state [ 361.845321] eth0: port 2(vif2.0) entering disabled state [ 362.462574] ACPI: PCI interrupt for device 0000:04:06.0 disabled [ 363.769836] irq 19: nobody cared (try booting with the "irqpoll" option) [ 363.817174] Pid: 0, comm: swapper Not tainted 2.6.26-1-xen-amd64 #1 [ 363.817174] [ 363.817174] Call Trace: [ 363.817174] <IRQ> [<ffffffff8037c9b0>] irq_ignore_unhandled+0x1c/0x32 [ 364.024802] [<ffffffff8025f9ab>] __report_bad_irq+0x30/0x72 [ 364.024802] [<ffffffff8025fc74>] note_interrupt+0x287/0x2c7 [ 364.024802] [<ffffffff8026055c>] handle_level_irq+0xc3/0x118 [ 364.024802] [<ffffffff8020e13e>] do_IRQ+0x4e/0x9a [ 364.024802] [<ffffffff8037d6c4>] evtchn_do_upcall+0x13c/0x1fc [ 364.024802] [<ffffffff8020bbde>] do_hypervisor_callback+0x1e/0x30 [ 364.024802] <EOI> [<ffffffff8037c992>] force_evtchn_callback+0xa/0xb [ 364.024802] [<ffffffff8020e795>] xen_safe_halt+0x90/0xa6 [ 364.024802] [<ffffffff8020a0c8>] xen_idle+0x2e/0x66 [ 364.024802] [<ffffffff80209cd6>] cpu_idle+0x97/0xb9 [ 364.024802] [ 364.024802] handlers: [ 364.024802] [<ffffffffa00b42ad>] (megasas_isr+0x0/0x45 [megaraid_sas]) [ 364.024802] Disabling IRQ #19 My take on this: Xen (or rather the dom0 kernel?) disables the shared IRQ, which at that time is still in use by the other card. In my case this crashes the machine, since the IRQ is shared with the RAID controller (sometimes resulting in destroyed filesystems). Is there some other way to fix this than making sure the cards don''t share IRQs (which is quite a hassle when building a significant number of machines with this configuration but with slightly different hardware)? A fix from a newer Xen release we could backport to our Debian kernel perhaps? Regards Florian ------------------------------------------ Florian Wagner Abteilung EDV Telefon: 0821 / 4201 - 453 Fax: 0821 / 4201 - 411 E-Mail: f_wagner@syscomp.de Syscomp Biochemische Dienstleistungen GmbH August-Wessels-Straße 5, 86154 Augsburg Postfach 102506, 86015 Augsburg Telefon: 0821 / 4201 - 0 Fax: 0821 / 417992 Web: http://www.syscomp.de E-Mail: syscomp@syscomp.de Geschäftsführer: Dr. med. Bernd Schottdorf Gabriele Schottdorf Registergericht Augsburg HRB 8670 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 07/07/2009 07:19, "Florian Wagner" <f_wagner@syscomp.de> wrote:> My take on this: Xen (or rather the dom0 kernel?) disables the shared > IRQ, which at that time is still in use by the other card. In my case > this crashes the machine, since the IRQ is shared with the RAID > controller (sometimes resulting in destroyed filesystems). > > Is there some other way to fix this than making sure the cards don''t > share IRQs (which is quite a hassle when building a significant number > of machines with this configuration but with slightly different > hardware)? A fix from a newer Xen release we could backport to our > Debian kernel perhaps?Newer versions of Xen, and suitably modern ports of the XenLinux patchset, support MSI interrupts. That would avoid the whole issue of interrupt sharing, if either the RIO Specialix or your RAID card supports MSI. Otherwise, can you not rely on cleanly shutting down the domU? Th eproblem is probably that the interrupt line gets wedged high because the domU device has an interrupt pending but the domU is no longer around to service it. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Florian Wagner <f_wagner@syscomp.de> 07.07.09 08:19 >>> >Is there some other way to fix this than making sure the cards don''t >share IRQs (which is quite a hassle when building a significant number >of machines with this configuration but with slightly different >hardware)? A fix from a newer Xen release we could backport to our >Debian kernel perhaps?Assuming your kernel has a call to irq_ignore_unhandled() out of note_interrupt(), there''s nothing but using hardware''s help (i.e. VT-d interrupt remapping) to get this addressed: The pv passthrough mechanism assumes that guests you assign physical devices to are well behaved, and your guest isn''t (it fails to disable the interrupt at the device). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 07/07/2009 08:32, "Jan Beulich" <JBeulich@novell.com> wrote:>>>> Florian Wagner <f_wagner@syscomp.de> 07.07.09 08:19 >>> >> Is there some other way to fix this than making sure the cards don''t >> share IRQs (which is quite a hassle when building a significant number >> of machines with this configuration but with slightly different >> hardware)? A fix from a newer Xen release we could backport to our >> Debian kernel perhaps? > > Assuming your kernel has a call to irq_ignore_unhandled() out of > note_interrupt(), there''s nothing but using hardware''s help (i.e. VT-d > interrupt remapping) to get this addressed: The pv passthrough > mechanism assumes that guests you assign physical devices to are well > behaved, and your guest isn''t (it fails to disable the interrupt at the > device).Another thought is that the toolstack should be trying to reset the device during domain destroy. I''m not certain that''s always possible and always done though (HVM passthru is better tested than PV passthru). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Assuming your kernel has a call to irq_ignore_unhandled() out of > note_interrupt(), [...]Judging from my backtrace this seems to be the case, right?> [...] there''s nothing but using hardware''s help (i.e. VT-d > interrupt remapping) to get this addressed: The pv passthrough > mechanism assumes that guests you assign physical devices to are well > behaved, and your guest isn''t (it fails to disable the interrupt at > the device).Do I understand this correctly: For safe operation of a virtual machine using PCI passthrough and shared interrupts a well behaved operating system in the vm is necessary. That is an OS that shuts down the mapped devices correctly before turning off (unload kernel module). So what am I to to in the case that someone issues a "xm destroy" on such a vm? There is no way to cleanly shutdown in such a situation, is there? That''s quite a risk for system stability of the host. One thoughtless "xm destroy" and the whole host is crashed, requiring at a minimum a cold reset or even a reinstall. Regards Florian -------------- Florian Wagner Abteilung EDV Telefon: 0821 / 4201 - 453 Fax: 0821 / 4201 - 411 E-Mail: f_wagner@syscomp.de Syscomp Biochemische Dienstleistungen GmbH August-Wessels-Straße 5, 86154 Augsburg Postfach 102506, 86015 Augsburg Telefon: 0821 / 4201 - 0 Fax: 0821 / 417992 Web: http://www.syscomp.de E-Mail: syscomp@syscomp.de Geschäftsführer: Dr. med. Bernd Schottdorf Gabriele Schottdorf Registergericht Augsburg HRB 8670 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 08/07/2009 07:35, "Florian Wagner" <f_wagner@syscomp.de> wrote:> Do I understand this correctly: For safe operation of a virtual machine > using PCI passthrough and shared interrupts a well behaved operating > system in the vm is necessary. That is an OS that shuts down the mapped > devices correctly before turning off (unload kernel module). > > So what am I to to in the case that someone issues a "xm destroy" on > such a vm? There is no way to cleanly shutdown in such a situation, is > there? > > That''s quite a risk for system stability of the host. One thoughtless > "xm destroy" and the whole host is crashed, requiring at a minimum a > cold reset or even a reinstall.We''re probably missing a device reset somewhere during domain destruction. Or it may be happening too late. Still there is a limit to VM isolation when IRQs are shared. The best bet is to use MSI, if one of your devices supports it. That would require Xen 3.4. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel